I was asked to help create a regular expression to validate that a string is a Fully Qualified Domain Name. Google searching didn’t give me a direct result, but it gave me something close. I found a clever website dedicated to sharing regular expressions called regexlib. On their site, someone posted a regex for MS FQDNs, which aren’t quite the same as regular FQDNs. The rules are a little different. In RFC 1035 section “2.3.1. Preferred name syntax”, we read:

The labels must follow the rules for ARPANET host names.  They must
start with a letter, end with a letter or digit, and have as interior
characters only letters, digits, and hyphen.  There are also some
restrictions on the length.  Labels must be 63 characters or less.

Section “2.3.4.Size Limits” reads:

Various objects and parameters in the DNS have size limits.  They are
listed below.  Some could be easily changed, others are more
fundamental.
labels          63 octets or less
names           255 octets or less
TTL             positive values of a signed 32 bit number.
UDP messages    512 octets or less

Given those rules, I’ve modified the regular expression from regexlib, to be:

(?=^.{1,254}$)(^(?:(?!\d|-)[a-zA-Z0-9\-]{1,63}(?<!-)\.?)+(?:[a-zA-Z]{2,})$)

The differences between the one on regexlib and mine are fairly subtle. Theirs excludes any label that is comprised of all digits, but the RFC only specifies that the first character can’t be a digit (or hyphen.) They also allow an underscore character as part of a label, which is not part of the RFC specification.

The only deviation to the RFC rules that I make is the extra rule that the top level domain (the part that comes after the last ‘.’) must be characters only, and must be 2 or more (.com, .net, .org, .eu, .uk, ect). I can’t find where that is documented though.