As the build-up to the second round of the new-gTLD programme[1] continues towards its launch in April 2026, we take a look at the issue of non-English-language support within the framework.
The programme itself initially began in 2012, involving the addition of large numbers of new domain-name extensions (global top-level domains, or gTLDs) to the Internet infrastructure. It incorporated a process whereby individual entities were able to apply to run (i.e. act as registry organisation for) their own extension, thereby maintaining control over features such as whether the TLD would be reserved for their own use (e.g. as a 'dot-brand'[2]) or open for registrations by third parties. A second round of applications is set to begin in Q2 of next year.
As the new phase approaches, ICANN (the Internet Corporation for Assigned Names and Numbers, as the organisation overseeing the initiative) has announced a number of improvements to the way multiple languages are supported within the programme[3]. Key points include:
- The addition of three additional non-Latin scripts in which applied-for TLDs can be expressed.
- The support of a greatly increased number of languages within the programme generally (to 380, up from 23 in the first round) - e.g. in areas surrounding technical provisions (such as compatibility with associated portals) and DNS infrastructure.
- Improvements to the process for assessing string similarity and potential for name 'collisions' (i.e. the same name existing in different namespaces), including the incorporation of visual and phonetic similarity evaluations. The application process will also feature the ability for specifying a 'second-choice' string (which must contain the first-choice version as a sub-string), for cases where the preferred version is deemed unacceptable, in addition to a more transparent process for resolving contentions.
These changes will give greater flexibility for entities operating in the non-English-speaking world, and are another area to consider for organisations assessing their place in the new-gTLD landscape (e.g. those considering an application for their own brand-specific extension).
How might the implications of these changes manifest themselves as the second phase of the programme comes into fruition? One way of gaining possible insights in this area - e.g. regarding potential use-cases for foreign-language domains - is to consider the current state of the landscape, with an obvious source of relevant 'clean' data being the existing set of internationalised domain names (IDNs)[4] (i.e. those incorporating non-Latin characters). The IDNs specifically are a special subset of the full universe of non-English domain names generally, which do (of course) include large numbers of examples written just in Latin characters. The non-English Latin-character domain landscape already includes many whole gTLDs, such as (Chinese) .xin and .weibo; (French) .moi and .maison; (German) .jetzt, .kaufen, .reise and .versicherung; (Hindi) .desi; (Italian) .casa, .immo and .moda; (Portuguese) .bom; (Spanish) .abogado, .futbol, .gratis, .tienda, .uno and .viajes), all of which (as non-IDNs) do not require any 'special' technical infrastructure.
As of the current time there are, however, around 150 internationalised new-gTLDs (i.e. where the domain-name extension itself is written in (or includes characters written in) a non-Latin script) which have been delegated into the Internet infrastructure[5]. Domain names (or domain-name extensions) of this type are sometimes expressed in an 'encoded' format called Punycode (in which they are converted to a string written wholly in Latin characters, denoted by the characters 'xn--' at the start), which is how they are expressed in zone-file raw data, for example.
Domain name zone files (containing lists of all registered domains across the extension in question, in addition to other technical configuration information) are published by (and are publicly available from) ICANN, for around 80 of these extensions, providing a ready source of data which can easily be analysed to identify trends and patterns in usage. Many of the remainder of the delegated IDN extensions are country-specific examples (e.g. comprising just a country name written in local language, or an abbreviation (analogous to familiar non-IDN ccTLDs such as .co.uk, .fr, .de, etc.)), or are extensions which may no longer be in active use.
For the approximately 80 IDN gTLDs for which zone-file data is available, it is possible to drill into the datasets to gain an overview of the specific domain names registered. Table 1 shows (for example) the most popular of these extensions by total numbers of registered domain names (for all IDN TLDs associated with more than 250 domains). 34 of the 80 extensions feature only five domains or fewer.
Table 1: The most popular IDN gTLDs currently, by numbers of registered domains (additional information mostly provided by Wikipedia[6])
It is noteworthy that the list of the most popular extensions is dominated by Chinese-language examples, mostly comprising generic terms (but with two brand-specific (China Mobile Communications Corporation, and CITIC Group) and two geographic (Guangdong and Foshan) extensions).
As an illustrative example, it is informative to consider the list of 31,192 individual domains with the most popular of these extensions (.在线, a Chinese-language extension meaning 'online'). In the vast majority of these cases (29,811, or 95.6% of the total), the second-level domain (SLD) - i.e. the part of the domain name to the left of the dot - is also written in (wholly or partly) non-Latin script (Chinese in most cases), thereby comprising fully internationalized domain names. Of the remainder, 107 of the domain names consist purely of digits as the SLD (i.e. numeric domain names[7], which are often popular in markets such as China, where their use can circumvent language barriers and particular numbers may have specific cultural significance). The remainder of the domains feature a range of different types of (Latin-character) terms in their SLD, including transliterations of Chinese words, a range of generic terms, and numerous brand references (including, presumably, both official and non-legitimate (potentially infringing) examples).
Overall, therefore, the landscape of IDNs essentially just comprises a microcosm of the domain landscape generally, but offers additional options and flexibility for brands and consumers in markets where the local language includes non-Latin characters. In light of the increased support for a wider range of languages in the forthcoming phase of the new-gTLD programme, brand owners will once again want to consider the opportunities and risks within the ever-expanding online landscape.
References
[1] https://www.iamstobbs.com/opinion/the-new-new-gtlds
[2] https://www.iamstobbs.com/opinion/a-review-of-the-current-state-of-the-new-gtld-programme-dot-brands
[4] https://www.iamstobbs.com/opinion/idn-tifying-trends-insights-from-the-set-of-non-latin-domain-names
[5] https://data.iana.org/TLD/tlds-alpha-by-domain.txt
[6] https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains
[7] https://www.iamstobbs.com/opinion/the-universe-of-numeric-domain-names
This article was first published on 17 June 2025 at:
https://www.iamstobbs.com/insights/the-new-new-gtlds-a-wider-domain-of-language-support
No comments:
Post a Comment