Friday, 29 December 2023

IDN-tifying trends: Insights from the set of non-Latin domain names

BLOG POST

Internationalized domain names (IDNs) are domain names featuring characters in non-Latin scripts, including examples featuring accented characters (such as münchen.de) and those which are entirely written in alternative character sets (such as яндекс.рф - Yandex Russia). This infrastructure allows brand owners to create domain names in local languages and target content to specific markets, but also provides potential for bad actors to create names which are deceptively similar to the official domain names of trusted brands (e.g. by substituting a character with a non-Latin equivalent appearing visually similar - a so-called 'homoglyph').

In this study, we consider the full set of registered IDNs across all gTLDs (generic top-level domains, or domain extensions) for which zone files are available, covering around 1,000 different extensions, to identify trends and patterns, and indicators of potential abuse.

Overall, there are around 1.3 million gTLD IDNs currently in existence, across 470 distinct domain extensions, with the most popular being .com (853k IDNs), .net (136k), and .在线 (Chinese for 'online') (28k).

267 distinct domain names were found to comprise homoglyph variations of any of the top ten most valuable global brands in 2023, and not to be under the control of the brand owner. A significant proportion of these feature indicators that they have been registered for infringing use, with 79 (30%) found to have active MX (mail exchange) records, indicating that they have been configured to be able to send and receive e-mails and could therefore be associated with phishing activity, and 128 (48%) having privacy-protected whois records. Various examples were identified as explicitly hosting fraudulent or infringing content, including instances of lookalike sites (e.g. ǥoogłe[.]com, googļe[.]com, googłe[.]online (re-directs to googłe[.]co) and ɠoogle[.]com (re-directs to gooqle[.]cm)), misdirection and brand confusion (e.g. gooqłe[.]com, gooġlɵ[.]com, visã[.]com and ɢoogle[.]net).

Across the set of these non-official homoglyph domains, the average number of replaced characters in the SLD name (the part of the domain name to the left of the dot) is 1.62, highlighting the necessity for the use of detection technologies able to analyse strings in full in order to detect visual similarity, rather than just identifying instances which differ from the official string by (say) a single character. Ten examples were identified of domains in which more than half of the characters have been replaced with non-Latin homoglyphs, including (all with 100% non-Latin characters): ᴀᴘᴘʟᴇ[.]com, арріе[.]com, арріе[.]net, арріө[.]com, аррӏе[.]com, ᴍɪᴄʀᴏꜱᴏꜰᴛ[.]com, ᴍᴄᴅᴏɴᴀʟᴅꜱ[.]com and ᴀᴍᴀᴢᴏɴ[.]com. Three of these domains have active MX records.

The following additional points warrant specific consideration by brand owners:

  • The number of homoglyph domains targeting trusted brands - and the significant proportion of these found to be actively infringing or to feature indicators of suspicious intentions - highlights the need for brand owners to monitor activity in this space, combined with tracking examples of concern for content changes and launching enforcement actions when appropriate. 
  • Many top brands incorporate instances of potentially-deceptive IDNs in their defensive domain portfolios; however, this approach in isolation is likely to be of limited effectiveness because of the infinite potential variations available to would-be infringers. Where domains are held for defensive reasons, it may be advisable for them to be configured to re-direct to the official brand website, to maximise traffic and minimise the risk of customer confusion.

Similar trends in potentially fraudulent domain registration activity have also been observed in the landscape of Web3 blockchain domains, which also allow for a wide range of non-Latin characters[1]. This arena is also worthy of careful consideration by brand owners, who may wish to explore brand protection strategies across these emerging technologies. This approach may be particularly valid as the availability of desirable domain names begins to run low across traditionally popular areas of the domain landscape, such as .com[2].

References

[1] https://www.iamstobbs.com/trends-in-web3-ebook

[2] https://www.iamstobbs.com/availability-of-domains-ebook

This article was first published on 29 December 2023 at:

https://www.iamstobbs.com/opinion/idn-tifying-trends-insights-from-the-set-of-non-latin-domain-names

* * * * *

WHITE PAPER

Executive Summary

Internet technology allows users to create domain names with characters in non-Latin scripts, allowing targeting of content to local markets. These so-called internationalized domain names (IDNs) can, however, also be abused by bad actors to create deceptive websites with names which appear visually extremely similar to the official domains of trusted brand websites.

In this study, we consider the set of IDNs currently registered across all (approximately 1,000) gTLDs (generic top-level domains, or domain extensions) which have zone files available from ICANN's Centralized Zone Data Service, to identify trends and patterns, and indicators of potential abuse.

The main findings of the analysis are as follows:

  • Across the set of gTLDs, around 1.3 million IDNs are currently registered, covering 470 distinct domain extensions. The top three TLDs, by numbers of IDNs, are .com (853k IDNs), .net (136k), and .在线 (Chinese for 'online') (28k).
  • The top three languages utilised for IDNs are Chinese (506k IDNs), Korean (136k), and German (113k).
  • Across the dataset, the IDNs range in length from 1 to 57 characters.
  • 388 IDNs were identified with SLD[1] names appearing visually similar to those of the main corporate website of any of the top ten most valuable global brands. In these cases, one or more characters from the brand name have been replaced by a non-Latin character which appears visually similar - these are so-called 'homoglyph domains', and present the potential for deceptive misuse by bad actors.
  • Excluding the domains which appear to be under the ownership of the brand in question (121 instances; presumably defensive registrations, etc.), the following observations can be made about the other homoglyph domains for these top ten brands in the dataset:
    • 73 (27%) return a live website response
    • 79 (30%) have active MX (mail exchange) records, indicating that they have been configured to be able to send and receive emails and could therefore be associated with phishing activity
    • 128 (48%) have registrant information redacted using a privacy-protection service
  • Many of these domains are being actively used to host fraudulent or infringing content, including instances of lookalike sites, misdirection and brand confusion.
  • Across the dataset of non-official homoglyph domains for the top ten brands, the average number of replaced characters in the SLD name is 1.62. There are ten examples of domains in which more than half of the characters have been replaced with non-Latin homoglyphs, including (all with 100% non-Latin characters): ᴀᴘᴘʟᴇ[.]com, арріе[.]com, арріе[.]net, арріө[.]com, аррӏе[.]com, ᴍɪᴄʀᴏꜱᴏꜰᴛ[.]com, ᴍᴄᴅᴏɴᴀʟᴅꜱ[.]com and ᴀᴍᴀᴢᴏɴ[.]com. Three of these domains have active MX records.

Introduction

Modern Internet infrastructure allows for the creation of domain names containing non-Latin characters, such as accented characters and text from wholly distinct character sets (internationalized domain names, or IDNs). Whilst this presents opportunities for brand owners to create domain names in local languages, and engage with target audiences, it does also present opportunities for fraudsters to create names which are deceptively similar to the official domain names of trusted brands, for example by substituting a character with a non-Latin equivalent appearing visually similar (so-called 'homoglyphs').

In this study, we consider the set of IDNs present across the full range of generic top-level domains (gTLDs) or domain extensions, using information from ICANN’s Centralized Zone Data Service[2]. In these zone files (domain configuration data files), IDNs are represented in an encoded form called Punycode, in which they are represented in Latin-character strings beginning 'xn--'. The encoded version displays any Latin characters from the domain name and also represents (as Latin characters) any non-Latin characters and their relative positions within the string (e.g. hermès.com is represented in Punycode as xn--herms7ra.com). For the analysis, all Punycode domains are translated to their true IDN equivalents, and trends and patterns in the dataset inspected.

Analysis

1. Top-level statistics for the full dataset

Overall, around 1.3 million IDNs exist across the set of gTLDs for which zone files are available. 470 distinct TLDs have at least one IDN registered. Table 1 and Figure 1 show the top ten most popular TLDs for IDN registrations.

TLD
                                                                        
No. of IDNs
                                
  com853,308
  net135,775
  在线 (xn--3ds443g) (Chinese for 'online')27,956
  top24,988
  商 (xn--czr694b) (Chinese for 'trademark')24,894
  公司 (xn--55qx5d) (Chinese for 'company')23,972
  org22,470
  info17,365
  网 (xn--io0a7i) (Chinese for 'network')16,377
  online15,670

Table 1: Top TLDs by numbers of IDNs

Figure 1: Top TLDs by numbers of IDNs

It is noteworthy that four of the top ten most popular TLDs for non-Latin domain names are themselves internationalized extensions (which can also be alternatively represented in Punycode format). All of the examples in this case are in the Chinese language.

Table 2 and Figure 2 show the top ten most popular languages[3] represented in the second-level domain names (SLDs) (i.e. the part of the domain name to the left of the dot) of the set of IDNs.

SLD language
code
                                
SLD language[4,5]
                                
No. of IDNs
                                
zh  Chinese505,952
ko  Korean136,248
de  German112,566
ja  Japanese104,861
th  Thai42,153
en  English38,844
zh-Hant  Chinese (trad.)35,952
tr  Turkish34,859
es  Spanish34,576
fr  French32,802

Table 2: Top SLD languages by numbers of IDNs

Figure 2: Top SLD languages by numbers of IDNs

Perhaps unsurprisingly, the set of IDNs is dominated by languages using entirely non-Latin alphabets, with four of the top five languages utilising alternative character sets. In total, 125 different languages are represented within the dataset.

Figure 3 shows the distribution of domain-name (SLD) lengths (in characters) across the full dataset, considering the IDN representations of the names (rather than their Punycode equivalents).

Figure 3: Distribution of IDN (SLD) lengths

The longest domain name (SLD) in the dataset is 57 characters (1 instance). A full list of all IDNs of 56 characters in length or greater is shown in Appendix A. The dataset also includes over 14,000 IDNs with an SLD length of one character.

2. Deceptive homoglyph domain names

In this section we consider homoglyph domain names where the SLD is identical to the that of the main official corporate domain name of any of the top ten most valuable global brands in 2023[6], apart from the replacement of one or more characters with a (non-Latin) character which appears visually similar, but with no additional keywords or other terms - in other words, these are the IDNs with the greatest potential for customer confusion and fraudulent use relating to these brands. Table 3 shows the number of such variants identified within the dataset, noting that some of these appear likely (on the basis of registrant details and/or the use of an enterprise-class domain registrar) to be under the control of the official brand owner, presumably being held as defensive registrations or for other purposes (e.g. for use in internal phishing tests).

Brand string
                                
.com
                                
Other gTLDs
                                
Total
                                
  apple30 (10)6 (1)36 (11)
  google159 (44)27 (6)186 (50)
  microsoft34 (0)7 (0)41 (0)
  amazon82 (42)17 (6)99 (48)
  mcdonalds4 (0)04 (0)
  visa5 (2)05 (2)
  tencent1 (0)01 (0)
  louisvuitton1 (0)01 (0)
  mastercard10 (10)010 (10)
  coca(-)cola*5 (0)05 (0)
  Total331 (108)57 (13)388 (121)

* Hyphen optional

Table 3: Total numbers of homoglyph domain names for each of the top ten most valuable global brands. Shown in brackets are the numbers of these domains which appear to be under the control of the official brand owner.

The visual similarity of some of these domains to the names of the official sites in question is striking; for example, the list of homoglyph domains for Google (the top-ten brand most heavily targeted by this type of infringement) is shown in Appendix B.

Considering only the 267 domains which are apparently not under the control of the official brand owner in question, the following observations are apparent:

  • 73 (27%) return some sort of live website response (i.e. an HTTP status code of 200)
  • 79 (30%) have active MX records, indicating that they have been configured to be able to send and receive e-mails and could therefore be associated with phishing activity. 53 of these have no active website and may be being used for their e-mail functionality only.
  • 128 (48%) explicitly make use of some sort of privacy-protection service in their whois record, as is often the case for domains registered for egregious use.
  • The registrar breakdown is dominated by retail-grade providers, often popular with infringers[7], with the top three within the dataset found to be GoDaddy.com, LLC (102 domains), Squarespace Domains II LLC (38) and NameCheap, Inc. (25).
  • Many of the domains have been long-lived, with the earliest examples found to have creation dates within 2001. Only 46 of the domains were registered during 2023, though activity appears to be ongoing, with the newest example registered on 13-Sep-2023.

Amongst the homoglyph domains resolving to live content, there are a number of examples of particular concern (Figure 4).

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

Figure 4: Examples of live sites of concern hosted on homoglyph domains targeting any of the top ten most valuable global brands:
  • Lookalike sites: 
    • (a) ǥoogłe[.]com (xn--ooge-21a88g[.]com); 
    • (b) googļe[.]com (xn--googe-m6a[.]com); 
    • (c) googłe[.]online (xn--googe-n7a[.]online) - re-directs to googłe[.]co (xn--googe-n7a[.]co); 
    • (d) ɠoogle[.]com (xn--oogle-kmc[.]com) - re-directs to gooqle[.]cm
  • Misdirection / brand confusion / other brand misuse: 
    • (e) gooqłe[.]com (xn--gooqe-n7a[.]com); 
    • (f) gooġlɵ[.]com (xn--gool-dxa55r[.]com); 
    • (g) visã[.]com (xn--vis-ola[.]com); 
    • (h) ɢoogle[.]net (xn--oogle-wmc[.]net)
  • Possible piracy site: 
    • (i) äpple[.]online (xn--pple-koa[.]online)
  • Other brand issues: 
    • (j) googľe[.]com (xn--googe-y6a[.]com), gᴑȯgle[.]com (xn--ggle-v0b5042b[.]com), gȯoglɵ[.]com (xn--gogl-v0b73b[.]com) and goȯglɵ[.]com (xn--gogl-w0b63b[.]com)
  • Domain name offered for sale:
    • (k) ɢooɢle[.]com (xn--oole-47bc[.]com)
  • Parking page featuring industry-relevant pay-per-click ads: 
    • (l) amazoñ[.]com (xn--amazo-sta[.]com)

We next further analyse the set of (non-officially owned) homoglyph domains with SLD names similar to any of the top ten most valuable global brands, with a view to identifying the proportion of each string consisting of replaced characters (e.g. for 'ᴀpple[.]com' (xn--pple-k13a[.]com), where the only non-Latin character is the 'ᴀ', there is one replaced character out of five (i.e. 20% of the whole string)). For this analysis, we exclude seven examples where the whole SLD is in a consistent non-Latin script (e.g. 'γοογλε' (Greek) for 'google' and 'амазон' (Cyrillic) for 'amazon', as these domains may be intended for targeting towards a non-English market, rather than being explicitly deceptive), leaving a dataset of 260 domains.

Table 4 shows the total number of domains in the dataset for each of the ten brand strings, and the average number of characters (and proportion of the total) in the brand string which are replaced, calculated across all relevant domains in each case.

Brand string
                                
No. domains
                                
Mean no. of
replaced characters
                                          
Mean % of
replaced characters
                                          
  apple251.9238%
  google1351.6728%
  microsoft411.3915%
  amazon451.4424%
  mcdonalds43.0033%
  visa31.0025%
  tencent12.0029%
  louisvuitton12.0017%
  mastercard0--
  coca(-)cola*51.2015%

* Hyphen optional

Table 4: Number of (non-official) homoglyph domain names for each of the top ten most valuable global brands, and the average number and proportion of replaced characters across the set in each case

Across the full dataset, the average number of replaced characters in a homoglyph domain is 1.62 (26% of the whole string), highlighting the necessity for the use of detection technologies able to analyse strings in full in order to detect visual similarity, rather than just identifying instances which differ from the official string by (say) a single character. The dataset includes ten domains in which more than one half of the characters in the string are replaced by non-Latin homoglyphs. These are listed in Table 5.

Domain name
                                
Punycode representation
                                          
No. of replaced
characters
                                
% of replaced
characters
                                
  ᴀᴘᴘʟᴇ[.]com  xn--spa916kwa0ea[.]com5100%
  арріе[.]com†  xn--80ak6aa4i[.]com5100%
  арріе[.]net†  xn--80ak6aa4i[.]net5100%
  арріө[.]com†  xn--80a6aa2gv8a[.]com5100%
  аррӏе[.]com  xn--80ak6aa92e[.]com5100%
  ᴍɪᴄʀᴏꜱᴏꜰᴛ[.]com  xn--9na8b158j8ana8f5252lha[.]com9100%
  ᴍᴄᴅᴏɴᴀʟᴅꜱ[.]com  xn--koa0gs43goafd4cs67392a[.]com9100%
  ᴀᴍᴀᴢᴏɴ[.]com  xn--koa507ka5cl7i[.]com6100%
  ɡᴏᴏɡle[.]com  xn--le-igba3625aa[.]com467%
  gᴑᴑgḷɵ[.]com  xn--gg-9hb063ya97o[.]com467%

Table 5: Domains in which more than half of the characters are replaced by non-Latin homoglyphs

None of these domains resolves to any significant content as of the time of analysis (one resolves to a page featuring pay-per-click links; one (titled 'IDN Homograph Example') links to an article on IDN-based phishing[8]; and one re-directs to the official Yahoo website). However, three of the other domains (marked with a dagger (†)) have active MX records, which is an obvious source of potential concern.

Outside the set of top ten brands, a number of additional strings were also found to have been particularly highly represented in the set of homoglyph domains (though potentially comprising a mixture of infringements and officially-owned domains). Some examples are shown in Appendix C. By far the most common strings for which variants were observed in the dataset are 'aresmgmt' (presumably in reference to investment management company Ares Management, aresmgmt.com) and 'united' (referring to United Airlines, united.com). In the latter case, the domains appear generally to be owned by United Airlines (though are not resolving to any significant content); in the former case, however, the domains appear generally not to be under official ownership (registered via GoDaddy / Domains By Proxy LLC) and may consequently be of concern. Overall, these types of homoglyph infringements appear to be much more common on .com than across the other gTLDs - presumably a reflection of the frequency of use of .com for official sites, and the corresponding potential for confusion.

Conclusion

The greatest source of concern from this analysis is the large numbers of homoglyph domains which appear visually extremely similar to the official websites of trusted brands and thereby present significant potential for customer confusion and corresponding fraudulent activity. For the top ten most valuable global brands, the significant numbers definitively found to be actively resolving to infringing content, together with the large number of others featuring indicators of risk (active MX records, privacy-protected whois records and/or use of retail-grade registrars) is indicative that these types of domain are indeed frequently used for brand attacks.

These observations highlight the importance of brand owners employing proactive programmes of brand monitoring and enforcement, using technologies which are able to detect these types of brand variants, rather than just exact- or substring brand matches. The importance of monitoring dormant domains for subsequent changes to site content is also clear.

Part of the solution may also be a defensive registration policy, though - as we have seen from the examples in this analysis - the infinite scope for homoglyph-type variations means that this approach in isolation will only take a brand owner so far (and may be costly); in cases where brands have been found to be holding portfolios of homoglyph domains for defensive purposes, there are typically at least as many equally convincing other variant domains available for registration, or currently held by third parties. It may also be generally advisable - where domains are held for defensive reasons - for brand owners to ensure that they are configured to re-direct to the official brand website, to maximise traffic and minimise the risk of customer confusion.

Appendix A: List of all IDNs with a SLD length of 56 characters or greater

Domain name
  
Language
code
                
SLD length
(characters)
                
  ဪဪဪဪဪဪဪဪဪဪဪဪဪ
  ဪဪဪဪဪဪဪဪဪဪဪဪဪ
  ဪဪဪဪဪဪဪဪဪဪဪဪဪ
  ဪဪဪဪဪဪဪဪဪဪဪဪဪ
  ဪဪဪဪဪ[.]name
my57
  abogadoynotariosalvadoreñoenelvalledesanfernandoenviosde[.]comes56
  adrianameneghini-psicológapsicoterapeutanaabordagempsica[.]compt56
  alloservicetaxiconventionnévsltransenprovencelucarcsmuyc[.]comen56
  amcouvertureétenchéité-nettoyage-hydrofuge-anti-mousse-t[.]comfr56
  asociacioncolombianadeconductoresdevehículosparticulares[.].comes56
  authenticaspécialistedesbrunchsmariageanniversairecockta[.]comen56
  carineinstitutdrainagelymphatiquerenatafrançahydrofacial[.]comen56
  christelleprevotarata-agenceimmobilière-parentis-en-born[.]comfr56
  cottunettoyageetpréparationesthétiqueautoathouars79100et[.]comfr56
  coupeénergétiquevibratoirecoiffeurenconsciencelavillaauc[.]comfr56
  dessoyrénovdemoussagetoiturepeinturehydrofugecornichetra[.]comen56
  énergétique-traditionnelle-chinoise-acupuncture-bordeaux[.]comfr56
  entreprisedepeintureksn91-peintredintérieuretdextérieurr[.]comfr56
  fındıklıpvcpencerevekapıtamirotamatikkepenktamirsineklik[.]comtr56
  gîte-4personnes-jacuzzi-sauna-privatif-stmalo-mtstmichel[.]comfr56
  nuestra-señora-del-rosario-sanfernando-y-santiago-merced[.]comes56
  recyclagedemetauxàdomicilecommercialetchantierconstructi[.]comen56
  secure-hizmet-24932495ı249u2492-sahibinden-param1guvende[.]comtr56
  slovianhair-uslugifryzjerskieprzedluzanieizageszczaniewł[.]compl56
  asociacioncolombianadeconductoresdevehículosparticulares[.]orges56
  die-neue-sammlung-museum-für-angewandte-kunst-verwaltung[.]bayernde56
  frenteintercontinentalventanadelaluchapopularsalvadoreña[.]orges56
  guinchopantaneirocpa-serviçodereboque-autosocorro-maisba[.]devpt56
  ministerialbeauftragter-für-die-gymnasien-in-oberfranken[.]bayernde56
  staatliches-bauamt-augsburg-strassenmeisterei-nördlingen[.]bayernde56
  staatliches-bauamt-würzburg-strassenmeisterei-ochsenfurt[.]bayernde56
  wasserwirtschaftsamt-ansbach-seemeisterstelle-altmühlsee[.]bayernde56
  wasserwirtschaftsamt-kempten-flussmeisterstelle-türkheim[.]bayernde56
  wasserwirtschaftsamt-münchen-flussmeisterstelle-freising[.]bayernde56
  wasserwirtschaftsamt-nürnberg-flussmeisterstelle-rothsee[.]bayernde56
  頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂
  頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂 頂頂頂頂頂頂頂頂[.]top
zh-Hant56
  顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶
  顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶 顶顶顶顶顶顶顶顶[.]top
zh56

Appendix B: List of homoglyph domains for Google

N.B. Examples shown in square brackets are likely to be under the control of the official brand owner.

  • googǀe[.]com
  • googǀɵ[.]com
  • googıe[.]com
  • googíe[.]com
  • googīe[.]com
  • [googɩe[.]com]
  • goọgie[.]com
  • gọọgie[.]com
  • ǥooǥɩe[.]com
  • ɡooɡɩe[.]com
  • googıɵ[.]com
  • googɨɵ[.]com
  • gꝏgle[.]com
  • googʟe[.]com
  • [gooɢle[.]com]
  • goᴏgle[.]com
  • gᴏogle[.]com
  • gᴏᴏgle[.]com
  • [ɢoogle[.]com]
  • ɢooɢle[.]com
  • googlé[.]com
  • googlė[.]com
  • googlê[.]com
  • [googlë[.]com]
  • googlě[.]com
  • googlĕ[.]com
  • googlē[.]com
  • googlę[.]com
  • [googlẹ[.]com]
  • googlǝ[.]com
  • googlə[.]com
  • googlɘ[.]com
  • [googĺe[.]com]
  • googľe[.]com
  • googļe[.]com
  • googƚe[.]com
  • googłe[.]com
  • googłė[.]com
  • googłę[.]com
  • googḷe[.]com
  • googḷė[.]com
  • googḷẹ[.]com
  • googɭe[.]com
  • googḻe[.]com
  • gooġle[.]com
  • [gooģle[.]com]
  • gooǥle[.]com
  • gooǥlė[.]com
  • gooǥƚe[.]com
  • gooǥłe[.]com
  • gooɠle[.]com
  • [gooɡle[.]com]
  • goᴑgle[.]com
  • gᴏᴑgle[.]com
  • goógle[.]com
  • goóglé[.]com
  • [goògle[.]com]
  • goòglè[.]com
  • [goȯgle[.]com]
  • [goôgle[.]com]
  • goôglê[.]com
  • [goögle[.]com]
  • goōgle[.]com
  • goõgle[.]com
  • goøgle[.]com
  • [goơgle[.]com]
  • goọgle[.]com
  • goọglẹ[.]com
  • gᴑogle[.]com
  • gᴑᴑgle[.]com
  • gᴑȯgle[.]com
  • [góogle[.]com]
  • góógle[.]com
  • góóglè[.]com
  • góóglę[.]com
  • góògle[.]com
  • [gòogle[.]com]
  • gòógle[.]com
  • gòóglè[.]com
  • gòògle[.]com
  • [gȯogle[.]com]
  • gȯᴑgle[.]com
  • gȯȯgle[.]com
  • gȯȯglė[.]com
  • gôogle[.]com
  • gôógle[.]com
  • gôôgle[.]com
  • [gôôglè[.]com]
  • gôôglê[.]com
  • [gôōgle[.]com]
  • gôõgle[.]com
  • göogle[.]com
  • göógle[.]com
  • göôgle[.]com
  • göögle[.]com
  • [gööglë[.]com]
  • göõgle[.]com
  • gōogle[.]com
  • gõogle[.]com
  • gõôgle[.]com
  • gõõgle[.]com
  • gøogle[.]com
  • gøøglé[.]com
  • [gơogle[.]com]
  • gơoǥle[.]com
  • gơơgle[.]com
  • [gọogle[.]com]
  • [gọọgle[.]com]
  • ġoogle[.]com
  • ġooglė[.]com
  • ġøøgle[.]com
  • ĝoogle[.]com
  • [ğoogle[.]com]
  • ḡoogle[.]com
  • [ḡooḡle[.]com]
  • [ģoogle[.]com]
  • [ǥoogle[.]com]
  • ǥooglė[.]com
  • ǥoogłe[.]com
  • ǥooġle[.]com
  • ǥooǥle[.]com
  • ɠoogle[.]com
  • [ɡoogle[.]com]
  • ɡooɡle[.]com
  • ɡᴏᴏɡle[.]com
  • googlɞ[.]com
  • gꝏglɵ[.]com
  • googĺɵ[.]com
  • googľɵ[.]com
  • googɫɵ[.]com
  • googłɵ[.]com
  • gooġlɵ[.]com
  • goᴑglɵ[.]com
  • goȯglɵ[.]com
  • gᴑᴑglɵ[.]com
  • gᴑᴑgḷɵ[.]com
  • gȯoglɵ[.]com
  • ǥooglɵ[.]com
  • [googlœ[.]com]
  • [googlе[.]com]
  • gooqłe[.]com
  • gȯoqle[.]com
  • gooqlɵ[.]com
  • [goοgle[.]com]
  • [goοglе[.]com]
  • [goоgle[.]com]
  • [goоglе[.]com]
  • [gοogle[.]com]
  • [gοoglе[.]com]
  • [gοοglе[.]com]
  • [gοоgle[.]com]
  • [gοоglе[.]com]
  • [gоogle[.]com]
  • [gоoglе[.]com]
  • [gоοgle[.]com]
  • [gоοglе[.]com]
  • [gооgle[.]com]
  • [gооglе[.]com]
  • γοογλε[.]com
  • göögle[.]biz
  • [goögle[.]info]
  • [göogle[.]info]
  • [göögle[.]info]
  • ɢoogle[.]net
  • googlé[.]net
  • googlè[.]net
  • googlê[.]net
  • googlë[.]net
  • googlə[.]net
  • googłe[.]net
  • [góogle[.]net]
  • góógle[.]net
  • gòògle[.]net
  • gôôgle[.]net
  • göogle[.]net
  • göögle[.]net
  • gõõgle[.]net
  • gøøgle[.]net
  • [ɡoogle[.]net]
  • ɡooɡle[.]net
  • googłe[.]online
  • gøøgle[.]online
  • googłe[.]org
  • göogle[.]org
  • [göögle[.]org]
  • góógle[.]xyz 

Appendix C: Numbers of homoglyph domain names for a series of other heavily featured brand / keyword strings

Brand / keyword string
                                          
.com
                                
Other gTLDs
                                
Total
                                
  admiral70070
  alibaba-inc9797194
  alipay424385
  alipay-inc6060120
  allgau13031
  allstate1170117
  allstatecorporation2210221
  allstateinsurance2560256
  allstateinvestments2490249
  anthropic36036
  aresmgmt3,18603,186
  arrow55055
  avril64064
  bankia48048
  bitcoin411253
  boursobank1300130
  brainlab1310131
  calvinklein1720172
  canva69069
  cignahealthcare2020202
  coinbase16610176
  csileasing18392275
  divvypay79079
  facebook74983
  getdivvy69069
  gmail49150
  greentechrenewables4290429
  gulfstream82082
  hackerone1270127
  iledefrance7366139
  instagram66369
  investwithconfidence64064
  janestreet1980198
  ledger38846
  mailchimp1331134
  mdrbrand73073
  mdrcyber73073
  mdrdiscover96096
  optelgroup3900390
  paypal1408148
  prologis72072
  retirewithconfidence44044
  rogers67067
  rolex47047
  sailpoint40040
  snowflake62062
  sustainabilitywithsubstance66066
  tailwind5656112
  taitcommunications46046
  thecignagroup1270127
  thedebtbox44044
  trustwallet30131
  twosigma59059
  united2,34202,342
  verical34034
  wakanime83083
  wellington72072
  williams-int87087
  youtube46147
  zoom35136

References

[1] Second-level domain - the part of the name to the left of the dot

[2] https://czds.icann.org; all data based on the versions of the zone files downloaded on 28-Sep-2023 (1,082 TLDs)

[3] Language recognition is as per the 'DETECTLANGUAGE' function available via Google Sheets: https://support.google.com/docs/answer/3093278?hl=en

[4] https://developers.google.com/admin-sdk/directory/v1/languages

[5] https://www.w3schools.com/tags/ref_language_codes.asp

[6] https://www.kantar.com/inspiration/brands/revealed-the-worlds-most-valuable-brands-of-2023

[7] https://www.iamstobbs.com/opinion/web-dot-coms-but-once-a-year-holiday-shopping-activity-part-1-black-friday-domains

[8] https://www.xudongz.com/blog/2017/idn-phishing/

This article was first published as an e-book on 29 December 2023 at:

https://www.iamstobbs.com/idns-ebook

No comments:

Post a Comment

Unregistered Gems Part 6: Phonemizing strings to find brandable domains

Introduction The UnregisteredGems.com series of articles explores a range of techniques to filter and search through the universe of unregis...