Number in the top-level domain?

Does top-level domain can contain a number at the end?

Yes technically, except if it is purely numerical, then it can not be a TLD, under current rules and for easy reasons to understand (to disambiguate with IP addresses). And it can not contain a number at the end, except if it is an IDN TLD, for reasons enforced by ICANN.

Let us go back to some RFCs to have some clearer definitions of things:

RFC 952: DOD INTERNET HOST TABLE SPECIFICATION (October 1985)

This is the definition of an Internet “hostname” back then:

A “name” (Net, Host, Gateway, or Domain name) is a text string up
to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus
sign (-), and period (.). Note that periods are only allowed when
they serve to delimit components of “domain style names”. (See
RFC-921, “Domain Name System Implementation Schedule”, for
background). No blank or space characters are permitted as part of a
name. No distinction is made between upper and lower case. The first
character must be an alpha character. The last character must not be
a minus sign or period.

Note that this also has the following:

Single character names
or nicknames are not allowed.

Hence at that point:

  • com1 is a valid TLD
  • 3com is not (“The first character must be an alpha character.”)
  • 42 is not (same reason)
  • 1 is not (same reason)
  • a is not (“Single character names or nicknames are not allowed.”)

RFC 1034: DOMAIN NAMES – CONCEPTS AND FACILITIES (November 1987)

This is one of the RFC that created the DNS as we know today. For compatibility reasons it defined hostnames as a sequence of labels, where a label is defined as such:

They must start with a letter, end with a letter or digit, and have as
interior characters only letters, digits, and hyphen. There are also
some restrictions on the length. Labels must be 63 characters or
less.

The TLD is one label among others (the L in TLD). Per the above rule, com1 is a valid label, and hence a valid TLD, where 3com would not have been. Which directly brings us to the following amendment.

RFC 1123: Requirements for Internet Hosts — Application and Support (October 1989)

This amends the previous RFC by changing one rule:

The syntax of a legal Internet host name was specified in RFC-952
[DNS:4]. One aspect of host name syntax is hereby changed: the
restriction on the first character is relaxed to allow either a
letter or a digit. Host software MUST support this more liberal
syntax.

So at that point:

  • com1 is a valid TLD
  • 3com is also valid
  • 42 is valid
  • 1 is valid
  • a is valid

For the case of “numerical” TLDs, the following rule in first document applies:

Whenever a user inputs the identity of an Internet host, it SHOULD
be possible to enter either (1) a host domain name or (2) an IP
address in dotted-decimal (“#.#.#.#”) form. The host SHOULD check
the string syntactically for a dotted-decimal number before
looking it up in the Domain Name System.

and

If a dotted-decimal number can be entered without such
identifying delimiters, then a full syntactic check must be
made, because a segment of a host domain name is now allowed
to begin with a digit and could legally be entirely numeric
(see Section 6.1.2.4). However, a valid host name can never
have the dotted-decimal form #.#.#.#, since at least the
highest-level component label will be alphabetic.

RFC 1738: Uniform Resource Locators (URL) (December 1994)

This also speaks about the TLD, but giving:

The fully qualified domain name of a network host, or its IP
address as a set of four decimal digit groups separated by
“.”. Fully qualified domain names take the form as described
in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123
[5]: a sequence of domain labels separated by “.”, each domain
label starting and ending with an alphanumerical character and
possibly also containing “-” characters. The rightmost domain
label will never start with a digit, though, which
syntactically distinguishes all domain names from the IP
addresses.

RFC 3696: Application Techniques for Checking and Transformation of Names (February 2004)

This was needed to introduce IDNs (Internationalized Domain Names) and it has this to say:

Any characters, or combination of bits (as octets), are permitted in
DNS names. However, there is a preferred form that is required by
most applications. This preferred form has been the only one
permitted in the names of top-level domains, or TLDs. In general, it
is also the only form permitted in most second-level names registered
in TLDs, although some names that are normally not seen by users obey
other rules. It derives from the original ARPANET rules for the
naming of hosts (i.e., the “hostname” rule) and is perhaps better
described as the “LDH rule”, after the characters that it permits.
The LDH rule, as updated, provides that the labels (words or strings
separated by periods) that make up a domain name must consist of only
the ASCII [ASCII] alphabetic and numeric characters, plus the hyphen.
No other symbols or punctuation characters are permitted, nor is
blank space. If the hyphen is used, it is not permitted to appear at
either the beginning or end of a label. There is an additional rule
that essentially requires that top-level domain names not be all-
numeric.

In fact as soon as IDNs are involved, and they are IDN TLDs (both ccTLDs and gTLDs now), the encoding chosen generates an ASCII string of the form xn--something where the something can have digits, including at the end, like shown in other answers.

However it is not really clear from where the “additional rule” in the last sentence comes from.

RFC 4697: Observed DNS Resolution Misbehavior (October 2006)

Not defining anything, but providing some interesting facts:

The root name servers receive a significant number of A record
queries where the QNAME looks like an IPv4 address.

and

A possible solution is to delegate these numeric TLDs
from the root zone to a separate set of servers to absorb the
traffic.

Which clearly shows that indeed, in the wild, there are applications, maybe by mistake but it shows at least that it works technically, sending queries for names that are indeed formatted like IPv4 addresses, so with a fully numerical “TLD”.

There was in fact an experience to launch a .42 registry, obviously completely outside of ICANN ecosystem. You can see a summary of it at http://www.dotsauce.com/experimental-numeric-tld-42-domain/ and an archive of their main explanations at https://web.archive.org/web/20101222151118/http://register.42registry.org:80/ (in French).

It did not went far, even if it technically works.

It showed for example that Microsoft based OS by default did not consider purely numeric TLDs at all, but they provided a patch for that: https://support.microsoft.com/en-us/help/947228/error-message-when-you-try-to-join-a-windows-vista-based-client-comput “When you try to join a Windows Vista-based client computer to a top level domain (TLD) that has a purely numeric suffix, the Windows Vista-based client computer cannot join the domain. [..] This behavior is by design.”

Internet-Draft draft-liman-tld-names-06: Top Level Domain Name Specification (November 2011)

This finally gives some explanations on why purely numeric TLD or even TLD with one digit are sometimes considered invalid when it is not a clear consequence from above specifications:

(section 2.1 below refers to content in RFC 1123, quoted above)

In addition, the DISCUSSION section of Section 2.1 says:

 'However, a valid host name can never have the dotted-decimal form
 #.#.#.#, since at least the highest-level component label will be
 alphabetic.'  [Section 2.1]

Some implementers may have understood the above phrase ‘will be
alphabetic’ to be a protocol restriction.

But it basically just recommend to go with the flow and continue the same restrictions:

Neither [RFC0952] nor [RFC1123] explicitly states the reasons for
these restrictions. It might be supposed that human factors were a
consideration; [RFC1123] appears to suggest that one of the reasons
was to prevent confusion between dotted-decimal IPv4 addresses and
host domain names. In any case, it is reasonable to believe that the
restrictions have been assumed in some deployed software, and that
changes to the rules should be undertaken with caution.

Hence it offered this definition:

traditional-tld-label = 1*63(ALPHA)

This draft never converted to an RFC because not everyone agreed with it. You can find a thread with dissenting voices for it at https://www.ietf.org/mail-archive/web/dnsop/current/msg08866.html ; basically it was not clear if there was a restriction in the past that we are now trying to relax a little or if there never was a restriction to begin with and that people implemented systems wrongly.

For example you can see about this Chromium/Chrome bugreport: https://bugs.chromium.org/p/chromium/issues/detail?id=31405
Browsing failed if using a TLD starting with a digit or purely numeric (it worked if it ended with a digit with letters before). This was not considered as a bug, and is not fixed, because the browser ships with a list of TLDs so it can know which ones are valid which are not, besides testing their syntax.

ICANN Application Guidebook for new TLDs (June 2012)

Available at https://newgtlds.icann.org/en/applicants/agb/guidebook-full-04jun12-en.pdf
it says the following starting at page 64:

The ASCII label (i.e., the label as transmitted on the wire) must be valid as specified in technical standards Domain Names: Implementation and Specification (RFC 1035), and Clarifications to the DNS Specification (RFC 2181) and any updates thereto.

The ASCII label must be a valid host name, as specified in the technical standards DOD Internet Host Table Specification (RFC 952), Requirements for Internet Hosts — Application and Support (RFC 1123), and Application Techniques for Checking and Transformation of Names (RFC 3696), Internationalized Domain Names in Applications (IDNA)(RFCs 5890-5894), and any updates thereto. This includes the following:

The ASCII label must consist entirely of letters (alphabetic characters a-z), or

The label must be a valid IDNA A-label (further restricted as described in Part II below).

Specially note the: The ASCII label must consist entirely of letters (alphabetic characters a-z)

This immediately forbids any full numerical, as well as in fact any digit, including at end, except for IDN TLDs, the one with the form xn--something.

Note that someone asked directly ICANN about this, and got the following reply, shown at https://domaingang.com/domain-news/icann-applicant-handbook-this-is-why-we-cannot-have-numeric-gtlds/ :

Please note Numeric TLD’s were prohibited in the first round of applications.
The prohibition on numeric gTLDs in the applicant guidebook (http://newgtlds.icann.org/en/applicants/agb) derives from a number of technical concerns regarding the ability of such domains to operate properly. Domain names are often used in place where other kinds of identifiers may be used like IP addresses.

The fact that a TLD is all alphabetic is often a key determinant for software in identifying a domain name. If a TLD such as “.123” were allowed, you could have a domain name of “74.125.244.123” which would be difficult to discriminate from an IP address “74.125.244.123.”. There are also other considerations: some technical standards documentation states that TLDs will be alphabetical, which has been codified as an assumption in software also.

The limitation in the AGB to alphabetic characters was designed to limit these scenarios that means such TLDs are not likely to work well in software, as well as limit potential security issues that may result from the same issues.

Leave a Comment