BLOG POST
Internationalized domain names (IDNs) are domain names featuring characters in non-Latin scripts, including examples featuring accented characters (such as münchen.de) and those which are entirely written in alternative character sets (such as яндекс.рф - Yandex Russia). This infrastructure allows brand owners to create domain names in local languages and target content to specific markets, but also provides potential for bad actors to create names which are deceptively similar to the official domain names of trusted brands (e.g. by substituting a character with a non-Latin equivalent appearing visually similar - a so-called 'homoglyph').
In this study, we consider the full set of registered IDNs across all gTLDs (generic top-level domains, or domain extensions) for which zone files are available, covering around 1,000 different extensions, to identify trends and patterns, and indicators of potential abuse.
Overall, there are around 1.3 million gTLD IDNs currently in existence, across 470 distinct domain extensions, with the most popular being .com (853k IDNs), .net (136k), and .在线 (Chinese for 'online') (28k).
267 distinct domain names were found to comprise homoglyph variations of any of the top ten most valuable global brands in 2023, and not to be under the control of the brand owner. A significant proportion of these feature indicators that they have been registered for infringing use, with 79 (30%) found to have active MX (mail exchange) records, indicating that they have been configured to be able to send and receive e-mails and could therefore be associated with phishing activity, and 128 (48%) having privacy-protected whois records. Various examples were identified as explicitly hosting fraudulent or infringing content, including instances of lookalike sites (e.g. ǥoogłe[.]com, googļe[.]com, googłe[.]online (re-directs to googłe[.]co) and ɠoogle[.]com (re-directs to gooqle[.]cm)), misdirection and brand confusion (e.g. gooqłe[.]com, gooġlɵ[.]com, visã[.]com and ɢoogle[.]net).
Across the set of these non-official homoglyph domains, the average number of replaced characters in the SLD name (the part of the domain name to the left of the dot) is 1.62, highlighting the necessity for the use of detection technologies able to analyse strings in full in order to detect visual similarity, rather than just identifying instances which differ from the official string by (say) a single character. Ten examples were identified of domains in which more than half of the characters have been replaced with non-Latin homoglyphs, including (all with 100% non-Latin characters): ᴀᴘᴘʟᴇ[.]com, арріе[.]com, арріе[.]net, арріө[.]com, аррӏе[.]com, ᴍɪᴄʀᴏꜱᴏꜰᴛ[.]com, ᴍᴄᴅᴏɴᴀʟᴅꜱ[.]com and ᴀᴍᴀᴢᴏɴ[.]com. Three of these domains have active MX records.
The following additional points warrant specific consideration by brand owners:
- The number of homoglyph domains targeting trusted brands - and the significant proportion of these found to be actively infringing or to feature indicators of suspicious intentions - highlights the need for brand owners to monitor activity in this space, combined with tracking examples of concern for content changes and launching enforcement actions when appropriate.
- Many top brands incorporate instances of potentially-deceptive IDNs in their defensive domain portfolios; however, this approach in isolation is likely to be of limited effectiveness because of the infinite potential variations available to would-be infringers. Where domains are held for defensive reasons, it may be advisable for them to be configured to re-direct to the official brand website, to maximise traffic and minimise the risk of customer confusion.
Similar trends in potentially fraudulent domain registration activity have also been observed in the landscape of Web3 blockchain domains, which also allow for a wide range of non-Latin characters[1]. This arena is also worthy of careful consideration by brand owners, who may wish to explore brand protection strategies across these emerging technologies. This approach may be particularly valid as the availability of desirable domain names begins to run low across traditionally popular areas of the domain landscape, such as .com[2].
References
[1] https://www.iamstobbs.com/trends-in-web3-ebook
[2] https://www.iamstobbs.com/availability-of-domains-ebook
This article was first published on 29 December 2023 at:
https://www.iamstobbs.com/opinion/idn-tifying-trends-insights-from-the-set-of-non-latin-domain-names
* * * * *
WHITE PAPER
Executive Summary
Internet technology allows users to create domain names with characters in non-Latin scripts, allowing targeting of content to local markets. These so-called internationalized domain names (IDNs) can, however, also be abused by bad actors to create deceptive websites with names which appear visually extremely similar to the official domains of trusted brand websites.
In this study, we consider the set of IDNs currently registered across all (approximately 1,000) gTLDs (generic top-level domains, or domain extensions) which have zone files available from ICANN's Centralized Zone Data Service, to identify trends and patterns, and indicators of potential abuse.
The main findings of the analysis are as follows:
- Across the set of gTLDs, around 1.3 million IDNs are currently registered, covering 470 distinct domain extensions. The top three TLDs, by numbers of IDNs, are .com (853k IDNs), .net (136k), and .在线 (Chinese for 'online') (28k).
- The top three languages utilised for IDNs are Chinese (506k IDNs), Korean (136k), and German (113k).
- Across the dataset, the IDNs range in length from 1 to 57 characters.
- 388 IDNs were identified with SLD[1] names appearing visually similar to those of the main corporate website of any of the top ten most valuable global brands. In these cases, one or more characters from the brand name have been replaced by a non-Latin character which appears visually similar - these are so-called 'homoglyph domains', and present the potential for deceptive misuse by bad actors.
- Excluding the domains which appear to be under the ownership of the brand in question (121 instances; presumably defensive registrations, etc.), the following observations can be made about the other homoglyph domains for these top ten brands in the dataset:
- 73 (27%) return a live website response
- 79 (30%) have active MX (mail exchange) records, indicating that they have been configured to be able to send and receive emails and could therefore be associated with phishing activity
- 128 (48%) have registrant information redacted using a privacy-protection service
- Many of these domains are being actively used to host fraudulent or infringing content, including instances of lookalike sites, misdirection and brand confusion.
- Across the dataset of non-official homoglyph domains for the top ten brands, the average number of replaced characters in the SLD name is 1.62. There are ten examples of domains in which more than half of the characters have been replaced with non-Latin homoglyphs, including (all with 100% non-Latin characters): ᴀᴘᴘʟᴇ[.]com, арріе[.]com, арріе[.]net, арріө[.]com, аррӏе[.]com, ᴍɪᴄʀᴏꜱᴏꜰᴛ[.]com, ᴍᴄᴅᴏɴᴀʟᴅꜱ[.]com and ᴀᴍᴀᴢᴏɴ[.]com. Three of these domains have active MX records.
Introduction
Modern Internet infrastructure allows for the creation of domain names containing non-Latin characters, such as accented characters and text from wholly distinct character sets (internationalized domain names, or IDNs). Whilst this presents opportunities for brand owners to create domain names in local languages, and engage with target audiences, it does also present opportunities for fraudsters to create names which are deceptively similar to the official domain names of trusted brands, for example by substituting a character with a non-Latin equivalent appearing visually similar (so-called 'homoglyphs').
In this study, we consider the set of IDNs present across the full range of generic top-level domains (gTLDs) or domain extensions, using information from ICANN’s Centralized Zone Data Service[2]. In these zone files (domain configuration data files), IDNs are represented in an encoded form called Punycode, in which they are represented in Latin-character strings beginning 'xn--'. The encoded version displays any Latin characters from the domain name and also represents (as Latin characters) any non-Latin characters and their relative positions within the string (e.g. hermès.com is represented in Punycode as xn--herms7ra.com). For the analysis, all Punycode domains are translated to their true IDN equivalents, and trends and patterns in the dataset inspected.
Analysis
1. Top-level statistics for the full dataset
Overall, around 1.3 million IDNs exist across the set of gTLDs for which zone files are available. 470 distinct TLDs have at least one IDN registered. Table 1 and Figure 1 show the top ten most popular TLDs for IDN registrations.
TLD | No. of IDNs |
---|---|
com | 853,308 |
net | 135,775 |
在线 (xn--3ds443g) (Chinese for 'online') | 27,956 |
top | 24,988 |
商标 (xn--czr694b) (Chinese for 'trademark') | 24,894 |
公司 (xn--55qx5d) (Chinese for 'company') | 23,972 |
org | 22,470 |
info | 17,365 |
网络 (xn--io0a7i) (Chinese for 'network') | 16,377 |
online | 15,670 |
Table 1: Top TLDs by numbers of IDNs
It is noteworthy that four of the top ten most popular TLDs for non-Latin domain names are themselves internationalized extensions (which can also be alternatively represented in Punycode format). All of the examples in this case are in the Chinese language.
Table 2 and Figure 2 show the top ten most popular languages[3] represented in the second-level domain names (SLDs) (i.e. the part of the domain name to the left of the dot) of the set of IDNs.
SLD language code | SLD language[4,5] | No. of IDNs |
---|---|---|
zh | Chinese | 505,952 |
ko | Korean | 136,248 |
de | German | 112,566 |
ja | Japanese | 104,861 |
th | Thai | 42,153 |
en | English | 38,844 |
zh-Hant | Chinese (trad.) | 35,952 |
tr | Turkish | 34,859 |
es | Spanish | 34,576 |
fr | French | 32,802 |
Table 2: Top SLD languages by numbers of IDNs
Figure 2: Top SLD languages by numbers of IDNs
Perhaps unsurprisingly, the set of IDNs is dominated by languages using entirely non-Latin alphabets, with four of the top five languages utilising alternative character sets. In total, 125 different languages are represented within the dataset.
Figure 3 shows the distribution of domain-name (SLD) lengths (in characters) across the full dataset, considering the IDN representations of the names (rather than their Punycode equivalents).
Figure 3: Distribution of IDN (SLD) lengths
The longest domain name (SLD) in the dataset is 57 characters (1 instance). A full list of all IDNs of 56 characters in length or greater is shown in Appendix A. The dataset also includes over 14,000 IDNs with an SLD length of one character.
2. Deceptive homoglyph domain names
In this section we consider homoglyph domain names where the SLD is identical to the that of the main official corporate domain name of any of the top ten most valuable global brands in 2023[6], apart from the replacement of one or more characters with a (non-Latin) character which appears visually similar, but with no additional keywords or other terms - in other words, these are the IDNs with the greatest potential for customer confusion and fraudulent use relating to these brands. Table 3 shows the number of such variants identified within the dataset, noting that some of these appear likely (on the basis of registrant details and/or the use of an enterprise-class domain registrar) to be under the control of the official brand owner, presumably being held as defensive registrations or for other purposes (e.g. for use in internal phishing tests).
Brand string | .com | Other gTLDs | Total |
---|---|---|---|
apple | 30 (10) | 6 (1) | 36 (11) |
159 (44) | 27 (6) | 186 (50) | |
microsoft | 34 (0) | 7 (0) | 41 (0) |
amazon | 82 (42) | 17 (6) | 99 (48) |
mcdonalds | 4 (0) | 0 | 4 (0) |
visa | 5 (2) | 0 | 5 (2) |
tencent | 1 (0) | 0 | 1 (0) |
louisvuitton | 1 (0) | 0 | 1 (0) |
mastercard | 10 (10) | 0 | 10 (10) |
coca(-)cola* | 5 (0) | 0 | 5 (0) |
Total | 331 (108) | 57 (13) | 388 (121) |
* Hyphen optional
Table 3: Total numbers of homoglyph domain names for each of the top ten most valuable global brands. Shown in brackets are the numbers of these domains which appear to be under the control of the official brand owner.
The visual similarity of some of these domains to the names of the official sites in question is striking; for example, the list of homoglyph domains for Google (the top-ten brand most heavily targeted by this type of infringement) is shown in Appendix B.
Considering only the 267 domains which are apparently not under the control of the official brand owner in question, the following observations are apparent:
- 73 (27%) return some sort of live website response (i.e. an HTTP status code of 200)
- 79 (30%) have active MX records, indicating that they have been configured to be able to send and receive e-mails and could therefore be associated with phishing activity. 53 of these have no active website and may be being used for their e-mail functionality only.
- 128 (48%) explicitly make use of some sort of privacy-protection service in their whois record, as is often the case for domains registered for egregious use.
- The registrar breakdown is dominated by retail-grade providers, often popular with infringers[7], with the top three within the dataset found to be GoDaddy.com, LLC (102 domains), Squarespace Domains II LLC (38) and NameCheap, Inc. (25).
- Many of the domains have been long-lived, with the earliest examples found to have creation dates within 2001. Only 46 of the domains were registered during 2023, though activity appears to be ongoing, with the newest example registered on 13-Sep-2023.
Amongst the homoglyph domains resolving to live content, there are a number of examples of particular concern (Figure 4).
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
- Lookalike sites:
- (a) ǥoogłe[.]com (xn--ooge-21a88g[.]com);
- (b) googļe[.]com (xn--googe-m6a[.]com);
- (c) googłe[.]online (xn--googe-n7a[.]online) - re-directs to googłe[.]co (xn--googe-n7a[.]co);
- (d) ɠoogle[.]com (xn--oogle-kmc[.]com) - re-directs to gooqle[.]cm
- Misdirection / brand confusion / other brand misuse:
- (e) gooqłe[.]com (xn--gooqe-n7a[.]com);
- (f) gooġlɵ[.]com (xn--gool-dxa55r[.]com);
- (g) visã[.]com (xn--vis-ola[.]com);
- (h) ɢoogle[.]net (xn--oogle-wmc[.]net)
- Possible piracy site:
- (i) äpple[.]online (xn--pple-koa[.]online)
- Other brand issues:
- (j) googľe[.]com (xn--googe-y6a[.]com), gᴑȯgle[.]com (xn--ggle-v0b5042b[.]com), gȯoglɵ[.]com (xn--gogl-v0b73b[.]com) and goȯglɵ[.]com (xn--gogl-w0b63b[.]com)
- Domain name offered for sale:
- (k) ɢooɢle[.]com (xn--oole-47bc[.]com)
- Parking page featuring industry-relevant pay-per-click ads:
- (l) amazoñ[.]com (xn--amazo-sta[.]com)
We next further analyse the set of (non-officially owned) homoglyph domains with SLD names similar to any of the top ten most valuable global brands, with a view to identifying the proportion of each string consisting of replaced characters (e.g. for 'ᴀpple[.]com' (xn--pple-k13a[.]com), where the only non-Latin character is the 'ᴀ', there is one replaced character out of five (i.e. 20% of the whole string)). For this analysis, we exclude seven examples where the whole SLD is in a consistent non-Latin script (e.g. 'γοογλε' (Greek) for 'google' and 'амазон' (Cyrillic) for 'amazon', as these domains may be intended for targeting towards a non-English market, rather than being explicitly deceptive), leaving a dataset of 260 domains.
Table 4 shows the total number of domains in the dataset for each of the ten brand strings, and the average number of characters (and proportion of the total) in the brand string which are replaced, calculated across all relevant domains in each case.
Brand string | No. domains | Mean no. of replaced characters | Mean % of replaced characters |
---|---|---|---|
apple | 25 | 1.92 | 38% |
135 | 1.67 | 28% | |
microsoft | 41 | 1.39 | 15% |
amazon | 45 | 1.44 | 24% |
mcdonalds | 4 | 3.00 | 33% |
visa | 3 | 1.00 | 25% |
tencent | 1 | 2.00 | 29% |
louisvuitton | 1 | 2.00 | 17% |
mastercard | 0 | - | - |
coca(-)cola* | 5 | 1.20 | 15% |
* Hyphen optional
Table 4: Number of (non-official) homoglyph domain names for each of the top ten most valuable global brands, and the average number and proportion of replaced characters across the set in each case
Across the full dataset, the average number of replaced characters in a homoglyph domain is 1.62 (26% of the whole string), highlighting the necessity for the use of detection technologies able to analyse strings in full in order to detect visual similarity, rather than just identifying instances which differ from the official string by (say) a single character. The dataset includes ten domains in which more than one half of the characters in the string are replaced by non-Latin homoglyphs. These are listed in Table 5.
Domain name | Punycode representation | No. of replaced characters | % of replaced characters |
---|---|---|---|
ᴀᴘᴘʟᴇ[.]com | xn--spa916kwa0ea[.]com | 5 | 100% |
арріе[.]com† | xn--80ak6aa4i[.]com | 5 | 100% |
арріе[.]net† | xn--80ak6aa4i[.]net | 5 | 100% |
арріө[.]com† | xn--80a6aa2gv8a[.]com | 5 | 100% |
аррӏе[.]com | xn--80ak6aa92e[.]com | 5 | 100% |
ᴍɪᴄʀᴏꜱᴏꜰᴛ[.]com | xn--9na8b158j8ana8f5252lha[.]com | 9 | 100% |
ᴍᴄᴅᴏɴᴀʟᴅꜱ[.]com | xn--koa0gs43goafd4cs67392a[.]com | 9 | 100% |
ᴀᴍᴀᴢᴏɴ[.]com | xn--koa507ka5cl7i[.]com | 6 | 100% |
ɡᴏᴏɡle[.]com | xn--le-igba3625aa[.]com | 4 | 67% |
gᴑᴑgḷɵ[.]com | xn--gg-9hb063ya97o[.]com | 4 | 67% |
Table 5: Domains in which more than half of the characters are replaced by non-Latin homoglyphs
None of these domains resolves to any significant content as of the time of analysis (one resolves to a page featuring pay-per-click links; one (titled 'IDN Homograph Example') links to an article on IDN-based phishing[8]; and one re-directs to the official Yahoo website). However, three of the other domains (marked with a dagger (†)) have active MX records, which is an obvious source of potential concern.
Outside the set of top ten brands, a number of additional strings were also found to have been particularly highly represented in the set of homoglyph domains (though potentially comprising a mixture of infringements and officially-owned domains). Some examples are shown in Appendix C. By far the most common strings for which variants were observed in the dataset are 'aresmgmt' (presumably in reference to investment management company Ares Management, aresmgmt.com) and 'united' (referring to United Airlines, united.com). In the latter case, the domains appear generally to be owned by United Airlines (though are not resolving to any significant content); in the former case, however, the domains appear generally not to be under official ownership (registered via GoDaddy / Domains By Proxy LLC) and may consequently be of concern. Overall, these types of homoglyph infringements appear to be much more common on .com than across the other gTLDs - presumably a reflection of the frequency of use of .com for official sites, and the corresponding potential for confusion.
Conclusion
The greatest source of concern from this analysis is the large numbers of homoglyph domains which appear visually extremely similar to the official websites of trusted brands and thereby present significant potential for customer confusion and corresponding fraudulent activity. For the top ten most valuable global brands, the significant numbers definitively found to be actively resolving to infringing content, together with the large number of others featuring indicators of risk (active MX records, privacy-protected whois records and/or use of retail-grade registrars) is indicative that these types of domain are indeed frequently used for brand attacks.
These observations highlight the importance of brand owners employing proactive programmes of brand monitoring and enforcement, using technologies which are able to detect these types of brand variants, rather than just exact- or substring brand matches. The importance of monitoring dormant domains for subsequent changes to site content is also clear.
Part of the solution may also be a defensive registration policy, though - as we have seen from the examples in this analysis - the infinite scope for homoglyph-type variations means that this approach in isolation will only take a brand owner so far (and may be costly); in cases where brands have been found to be holding portfolios of homoglyph domains for defensive purposes, there are typically at least as many equally convincing other variant domains available for registration, or currently held by third parties. It may also be generally advisable - where domains are held for defensive reasons - for brand owners to ensure that they are configured to re-direct to the official brand website, to maximise traffic and minimise the risk of customer confusion.
Appendix A: List of all IDNs with a SLD length of 56 characters or greater
Domain name | Language code | SLD length (characters) |
---|---|---|
ဪဪဪဪဪဪဪဪဪဪဪဪဪ ဪဪဪဪဪဪဪဪဪဪဪဪဪ ဪဪဪဪဪဪဪဪဪဪဪဪဪ ဪဪဪဪဪဪဪဪဪဪဪဪဪ ဪဪဪဪဪ[.]name | my | 57 |
abogadoynotariosalvadoreñoenelvalledesanfernandoenviosde[.]com | es | 56 |
adrianameneghini-psicológapsicoterapeutanaabordagempsica[.]com | pt | 56 |
alloservicetaxiconventionnévsltransenprovencelucarcsmuyc[.]com | en | 56 |
amcouvertureétenchéité-nettoyage-hydrofuge-anti-mousse-t[.]com | fr | 56 |
asociacioncolombianadeconductoresdevehículosparticulares[.].com | es | 56 |
authenticaspécialistedesbrunchsmariageanniversairecockta[.]com | en | 56 |
carineinstitutdrainagelymphatiquerenatafrançahydrofacial[.]com | en | 56 |
christelleprevotarata-agenceimmobilière-parentis-en-born[.]com | fr | 56 |
cottunettoyageetpréparationesthétiqueautoathouars79100et[.]com | fr | 56 |
coupeénergétiquevibratoirecoiffeurenconsciencelavillaauc[.]com | fr | 56 |
dessoyrénovdemoussagetoiturepeinturehydrofugecornichetra[.]com | en | 56 |
énergétique-traditionnelle-chinoise-acupuncture-bordeaux[.]com | fr | 56 |
entreprisedepeintureksn91-peintredintérieuretdextérieurr[.]com | fr | 56 |
fındıklıpvcpencerevekapıtamirotamatikkepenktamirsineklik[.]com | tr | 56 |
gîte-4personnes-jacuzzi-sauna-privatif-stmalo-mtstmichel[.]com | fr | 56 |
nuestra-señora-del-rosario-sanfernando-y-santiago-merced[.]com | es | 56 |
recyclagedemetauxàdomicilecommercialetchantierconstructi[.]com | en | 56 |
secure-hizmet-24932495ı249u2492-sahibinden-param1guvende[.]com | tr | 56 |
slovianhair-uslugifryzjerskieprzedluzanieizageszczaniewł[.]com | pl | 56 |
asociacioncolombianadeconductoresdevehículosparticulares[.]org | es | 56 |
die-neue-sammlung-museum-für-angewandte-kunst-verwaltung[.]bayern | de | 56 |
frenteintercontinentalventanadelaluchapopularsalvadoreña[.]org | es | 56 |
guinchopantaneirocpa-serviçodereboque-autosocorro-maisba[.]dev | pt | 56 |
ministerialbeauftragter-für-die-gymnasien-in-oberfranken[.]bayern | de | 56 |
staatliches-bauamt-augsburg-strassenmeisterei-nördlingen[.]bayern | de | 56 |
staatliches-bauamt-würzburg-strassenmeisterei-ochsenfurt[.]bayern | de | 56 |
wasserwirtschaftsamt-ansbach-seemeisterstelle-altmühlsee[.]bayern | de | 56 |
wasserwirtschaftsamt-kempten-flussmeisterstelle-türkheim[.]bayern | de | 56 |
wasserwirtschaftsamt-münchen-flussmeisterstelle-freising[.]bayern | de | 56 |
wasserwirtschaftsamt-nürnberg-flussmeisterstelle-rothsee[.]bayern | de | 56 |
頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂 頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂頂 頂頂頂頂頂頂頂頂[.]top | zh-Hant | 56 |
顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶 顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶顶 顶顶顶顶顶顶顶顶[.]top | zh | 56 |
Appendix B: List of homoglyph domains for Google
N.B. Examples shown in square brackets are likely to be under the control of the official brand owner.
- googǀe[.]com
- googǀɵ[.]com
- googıe[.]com
- googíe[.]com
- googīe[.]com
- [googɩe[.]com]
- goọgie[.]com
- gọọgie[.]com
- ǥooǥɩe[.]com
- ɡooɡɩe[.]com
- googıɵ[.]com
- googɨɵ[.]com
- gꝏgle[.]com
- googʟe[.]com
- [gooɢle[.]com]
- goᴏgle[.]com
- gᴏogle[.]com
- gᴏᴏgle[.]com
- [ɢoogle[.]com]
- ɢooɢle[.]com
- googlé[.]com
- googlė[.]com
- googlê[.]com
- [googlë[.]com]
- googlě[.]com
- googlĕ[.]com
- googlē[.]com
- googlę[.]com
- [googlẹ[.]com]
- googlǝ[.]com
- googlə[.]com
- googlɘ[.]com
- [googĺe[.]com]
- googľe[.]com
- googļe[.]com
- googƚe[.]com
- googłe[.]com
- googłė[.]com
- googłę[.]com
- googḷe[.]com
- googḷė[.]com
- googḷẹ[.]com
- googɭe[.]com
- googḻe[.]com
- gooġle[.]com
- [gooģle[.]com]
- gooǥle[.]com
- gooǥlė[.]com
- gooǥƚe[.]com
- gooǥłe[.]com
- gooɠle[.]com
- [gooɡle[.]com]
- goᴑgle[.]com
- gᴏᴑgle[.]com
- goógle[.]com
- goóglé[.]com
- [goògle[.]com]
- goòglè[.]com
- [goȯgle[.]com]
- [goôgle[.]com]
- goôglê[.]com
- [goögle[.]com]
- goōgle[.]com
- goõgle[.]com
- goøgle[.]com
- [goơgle[.]com]
- goọgle[.]com
- goọglẹ[.]com
- gᴑogle[.]com
- gᴑᴑgle[.]com
- gᴑȯgle[.]com
- [góogle[.]com]
- góógle[.]com
- góóglè[.]com
- góóglę[.]com
- góògle[.]com
- [gòogle[.]com]
- gòógle[.]com
- gòóglè[.]com
- gòògle[.]com
- [gȯogle[.]com]
- gȯᴑgle[.]com
- gȯȯgle[.]com
- gȯȯglė[.]com
- gôogle[.]com
- gôógle[.]com
- gôôgle[.]com
- [gôôglè[.]com]
- gôôglê[.]com
- [gôōgle[.]com]
- gôõgle[.]com
- göogle[.]com
- göógle[.]com
- göôgle[.]com
- göögle[.]com
- [gööglë[.]com]
- göõgle[.]com
- gōogle[.]com
- gõogle[.]com
- gõôgle[.]com
- gõõgle[.]com
- gøogle[.]com
- gøøglé[.]com
- [gơogle[.]com]
- gơoǥle[.]com
- gơơgle[.]com
- [gọogle[.]com]
- [gọọgle[.]com]
- ġoogle[.]com
- ġooglė[.]com
- ġøøgle[.]com
- ĝoogle[.]com
- [ğoogle[.]com]
- ḡoogle[.]com
- [ḡooḡle[.]com]
- [ģoogle[.]com]
- [ǥoogle[.]com]
- ǥooglė[.]com
- ǥoogłe[.]com
- ǥooġle[.]com
- ǥooǥle[.]com
- ɠoogle[.]com
- [ɡoogle[.]com]
- ɡooɡle[.]com
- ɡᴏᴏɡle[.]com
- googlɞ[.]com
- gꝏglɵ[.]com
- googĺɵ[.]com
- googľɵ[.]com
- googɫɵ[.]com
- googłɵ[.]com
- gooġlɵ[.]com
- goᴑglɵ[.]com
- goȯglɵ[.]com
- gᴑᴑglɵ[.]com
- gᴑᴑgḷɵ[.]com
- gȯoglɵ[.]com
- ǥooglɵ[.]com
- [googlœ[.]com]
- [googlе[.]com]
- gooqłe[.]com
- gȯoqle[.]com
- gooqlɵ[.]com
- [goοgle[.]com]
- [goοglе[.]com]
- [goоgle[.]com]
- [goоglе[.]com]
- [gοogle[.]com]
- [gοoglе[.]com]
- [gοοglе[.]com]
- [gοоgle[.]com]
- [gοоglе[.]com]
- [gоogle[.]com]
- [gоoglе[.]com]
- [gоοgle[.]com]
- [gоοglе[.]com]
- [gооgle[.]com]
- [gооglе[.]com]
- γοογλε[.]com
- göögle[.]biz
- [goögle[.]info]
- [göogle[.]info]
- [göögle[.]info]
- ɢoogle[.]net
- googlé[.]net
- googlè[.]net
- googlê[.]net
- googlë[.]net
- googlə[.]net
- googłe[.]net
- [góogle[.]net]
- góógle[.]net
- gòògle[.]net
- gôôgle[.]net
- göogle[.]net
- göögle[.]net
- gõõgle[.]net
- gøøgle[.]net
- [ɡoogle[.]net]
- ɡooɡle[.]net
- googłe[.]online
- gøøgle[.]online
- googłe[.]org
- göogle[.]org
- [göögle[.]org]
- góógle[.]xyz
Appendix C: Numbers of homoglyph domain names for a series of other heavily featured brand / keyword strings
Brand / keyword string | .com | Other gTLDs | Total |
---|---|---|---|
admiral | 70 | 0 | 70 |
alibaba-inc | 97 | 97 | 194 |
alipay | 42 | 43 | 85 |
alipay-inc | 60 | 60 | 120 |
allgau | 1 | 30 | 31 |
allstate | 117 | 0 | 117 |
allstatecorporation | 221 | 0 | 221 |
allstateinsurance | 256 | 0 | 256 |
allstateinvestments | 249 | 0 | 249 |
anthropic | 36 | 0 | 36 |
aresmgmt | 3,186 | 0 | 3,186 |
arrow | 55 | 0 | 55 |
avril | 64 | 0 | 64 |
bankia | 48 | 0 | 48 |
bitcoin | 41 | 12 | 53 |
boursobank | 130 | 0 | 130 |
brainlab | 131 | 0 | 131 |
calvinklein | 172 | 0 | 172 |
canva | 69 | 0 | 69 |
cignahealthcare | 202 | 0 | 202 |
coinbase | 166 | 10 | 176 |
csileasing | 183 | 92 | 275 |
divvypay | 79 | 0 | 79 |
74 | 9 | 83 | |
getdivvy | 69 | 0 | 69 |
gmail | 49 | 1 | 50 |
greentechrenewables | 429 | 0 | 429 |
gulfstream | 82 | 0 | 82 |
hackerone | 127 | 0 | 127 |
iledefrance | 73 | 66 | 139 |
66 | 3 | 69 | |
investwithconfidence | 64 | 0 | 64 |
janestreet | 198 | 0 | 198 |
ledger | 38 | 8 | 46 |
mailchimp | 133 | 1 | 134 |
mdrbrand | 73 | 0 | 73 |
mdrcyber | 73 | 0 | 73 |
mdrdiscover | 96 | 0 | 96 |
optelgroup | 390 | 0 | 390 |
paypal | 140 | 8 | 148 |
prologis | 72 | 0 | 72 |
retirewithconfidence | 44 | 0 | 44 |
rogers | 67 | 0 | 67 |
rolex | 47 | 0 | 47 |
sailpoint | 40 | 0 | 40 |
snowflake | 62 | 0 | 62 |
sustainabilitywithsubstance | 66 | 0 | 66 |
tailwind | 56 | 56 | 112 |
taitcommunications | 46 | 0 | 46 |
thecignagroup | 127 | 0 | 127 |
thedebtbox | 44 | 0 | 44 |
trustwallet | 30 | 1 | 31 |
twosigma | 59 | 0 | 59 |
united | 2,342 | 0 | 2,342 |
verical | 34 | 0 | 34 |
wakanime | 83 | 0 | 83 |
wellington | 72 | 0 | 72 |
williams-int | 87 | 0 | 87 |
youtube | 46 | 1 | 47 |
zoom | 35 | 1 | 36 |
References
[1] Second-level domain - the part of the name to the left of the dot
[2] https://czds.icann.org; all data based on the versions of the zone files downloaded on 28-Sep-2023 (1,082 TLDs)
[3] Language recognition is as per the 'DETECTLANGUAGE' function available via Google Sheets: https://support.google.com/docs/answer/3093278?hl=en
[4] https://developers.google.com/admin-sdk/directory/v1/languages
[5] https://www.w3schools.com/tags/ref_language_codes.asp
[6] https://www.kantar.com/inspiration/brands/revealed-the-worlds-most-valuable-brands-of-2023
[8] https://www.xudongz.com/blog/2017/idn-phishing/
This article was first published as an e-book on 29 December 2023 at:
No comments:
Post a Comment