I was asked to help create a regular expression to validate that a string is a Fully Qualified Domain Name. Google searching didn’t give me a direct result, but it gave me something close. I found a clever website dedicated to sharing regular expressions called regexlib. On their site, someone posted a regex for MS FQDNs, which aren’t quite the same as regular FQDNs. The rules are a little different. In RFC 1035 section “2.3.1. Preferred name syntax”, we read:
The labels must follow the rules for ARPANET host names. They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less.
Section “2.3.4.Size Limits” reads:
Various objects and parameters in the DNS have size limits. They are listed below. Some could be easily changed, others are more fundamental. labels 63 octets or less names 255 octets or less TTL positive values of a signed 32 bit number. UDP messages 512 octets or less
Given those rules, I’ve modified the regular expression from regexlib, to be:
(?=^.{1,254}$)(^(?:(?!\d|-)[a-zA-Z0-9\-]{1,63}(?<!-)\.?)+(?:[a-zA-Z]{2,})$)
The differences between the one on regexlib and mine are fairly subtle. Theirs excludes any label that is comprised of all digits, but the RFC only specifies that the first character can’t be a digit (or hyphen.) They also allow an underscore character as part of a label, which is not part of the RFC specification.
The only deviation to the RFC rules that I make is the extra rule that the top level domain (the part that comes after the last ‘.’) must be characters only, and must be 2 or more (.com, .net, .org, .eu, .uk, ect). I can’t find where that is documented though.
Does this handle .co.uk or .us.com .org.uk?
If it handles something.co.uk does it think something is a subdomain of co and limit it to 63 characters?
Is this compatible with all regex engines?
The RFC defines each label (which is limited to 63 characters) this way:
<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
Starts with a letter, can be any letter digit or hypen, ends with a letter or digit.
So all three labels in ‘something’.’co’.’uk’ are limited to 63 characters, if I’m reading it correctly. The 255 character limit is of the entire domain, of which something.co.uk uses 15 characters.
Just been reading up on naming restrictions and it also appears:
1. “One aspect of host name syntax is hereby changed: the restriction on the first character is relaxed to allow either a letter or a digit. Host software MUST support this more liberal syntax.” – RFC1123: Section 2.1 Host Names and Numbers.
2. “Single character names or nicknames are not allowed” – RFC952: Section ASSUMPTIONS 1 (referenced in above RFC).