Monday, 19 December 2022

The Highest-Threat TLDs – Part 2

by Justin Hartland and David Barnett

In the first article[1] of this two-part blog series, we looked at how frequently domains were used by bad actors for phishing activity across individual top-level domains (TLDs) or domain extensions, using data from CSC's Fraud Protection services, powered by our DomainSecSM platform[2]. In this second article, we analyse multiple datasets to determine the highest-threat TLDs, based on the frequency with which the domains are used egregiously for a range of cybercrimes.

In this deeper dive, we look at the following datasets:

  1. Spamhaus' 10 most abused TLDs[3], reflecting information in its domain blocking list and containing domains with poor reputations (generally those found to be associated with spam or malware).
  2. Netcraft's 50 TLDs with the highest ratios of cybercrime incidents to active sites[4], generally reflecting phishing and malware incidences.
  3. Palo Alto Networks' 10 TLDs with the highest rates of malicious domains[5], reflecting four categories of malicious content (malware, phishing, command and control (C2), and grayware), and expressed as the median of the absolute deviation from the median (MAD).
  4. Data from CSC's Fraud Protection services, as discussed in Part 1 of this series.

Each dataset measures the proportion of domains across each TLD deemed to be associated with threatening content[6]. For datasets 1, 2 and 3 as outlined above, proportions are expressed as the total number of domains analysed for the TLD in question.

Methodology: For ease of comparison, the threat frequency for each TLD within each dataset is again normalised, so that in each case the value for the highest-threat TLD is 1. The overall threat frequency for a TLD is then calculated as the average of the normalised scores across the datasets in which it appears. We excluded any TLDs from the results that were only present in CSC's dataset and where fewer than 50 phishing cases were recorded.

Analysis and discussion

The above methodology yields the following list in Table 1 for the top 30 highest-threat TLDs, ranked by overall normalised threat frequency.

TLD
                                                
Threat freq.
                                 
Registry Operator[7] Region (Country) / Type
  .ci 1.000 Autorité de Régulation des Télécommunications / TIC de Côte d’lvoire (ARTCI) Autorité de Régulation des Télécommunications / TIC de Côte d’lvoire (ARTCI) Africa (Ivory Coast)
  .zw 1.000 Postal and Telecommunications Regulatory Authority of Zimbabwe (POTRAZ) TelOne Pvt Ltd Africa (Zimbabwe)
  .sx 0.945 SX Registry SA B.V. Canadian Internet Registration Authority (CIRA) Caribbean (Sint Maarten)
  .mw 0.862 Malawi Sustainable Development Network Programme Malawi Sustainable Development Network Programme Africa (Malawi)
  .am 0.608 "Internet Society" Non-Governmental Organization "Internet Society" Non-Governmental Organization Asia (Armenia)
  .date* 0.506 dot Date Limited GoDaddy Registry New gTLD
  .cd 0.391 Office Congolais des Postes et Télécommunications (OCPT) Office Congolais des Postes et Télécommunications (OCPT) Africa (Democratic Rep. of the Congo)
  .ke 0.381 Kenya Network Information Center (KeNIC) Kenya Network Information Center (KeNIC) Africa (Kenya)
  .app* 0.377 Charleston Road Registry Inc. Google Inc New gTLD
  .bid* 0.361 dot Bid Limited GoDaddy Registry New gTLD
  .ly 0.356 General Post and Telecommunication Company Libya Telecom and Technology Africa (Libya)
  .bd 0.351 Posts and Telecommunications Division Bangladesh Telecommunications Company Limited (BTCL) Asia (Bangladesh)
  .surf* 0.325 Registry Services, LLC GoDaddy Registry New gTLD
  .sbs* 0.250 ShortDot CentralNic New gTLD
  .pw 0.240 Micronesia Investment and Development Corporation Radix FZC Asia (Palau)
  .dev* 0.222 Charleston Road Registry Inc. Google Inc New gTLD
  .quest* 0.209 XYZ.COM LLC CentralNic New gTLD
  .top* 0.196 Jiangsu Bangning Science and Technology Co., Ltd. Jiangsu Bangning Science and Technology Co., Ltd. New gTLD
  .page* 0.195 Charleston Road Registry Inc. Google Inc New gTLD
  .gq 0.192 GETESA Equatorial Guinea Domains B.V. (Freenom) Africa (Equatorial Guinea)
  .cf 0.168 Societe Centrafricaine de Telecommunications (SOCATEL) Centrafrique TLD B.V. (Freenom) Africa (Central African Republic)
  .ga 0.164 Agence Nationale des Infrastructures Numériques et des Fréquences (ANINF) Agence Nationale des Infrastructures Numériques et des Fréquences (ANINF) (Freenom) Africa (Gabon)
  .ml 0.157 Agence des Technologies de l’Information et de la Communication Mali Dili B.V. (Freenom) Africa (Mali)
  .buzz* 0.149 DOTSTRATEGY CO. GoDaddy Registry New gTLD
  .cyou* 0.141 ShortDot CentralNic New gTLD
  .cn 0.130 CNNIC CNNIC Asia (China)
  .monster* 0.106 XYZ.COM LLC CentralNic New gTLD
  .bar* 0.104 Punto 2012 Sociedad Anonima Promotora de Inversion de Capital Variable CentralNic New gTLD
  .host* 0.101 Radix FZC CentralNic New gTLD
  .io 0.085 Internet Computer Bureau Limited Internet Computer Bureau Limited Asia (British Indian Ocean Territory)

*Extensions where there are currently no customer domains under CSC's management.

Table 1: The top 30 TLDs with the highest overall normalised threat frequencies

Table 2 shows the datasets in which each of the top 30 TLDs appear.

TLD
                              
Spamhaus
                              
Netcraft
                              
Palo Alto Networks
                              
CSC (phishing)
                              
  .ci
  .zw
  .sx
  .mw
  .am
  .date
  .cd
  .ke
  .app
  .bid
  .ly
  .bd
  .surf
  .sbs
  .pw
  .dev
  .quest
  .top
  .page
  .gq
  .cf
  .ga
  .ml
  .buzz
  .cyou
  .cn
  .monster
  .bar
  .host
  .io

Table 2: Datasets in which each of the top 30 TLDs by overall threat frequency appear

It is significant that this list is dominated by extensions from Africa, Asia, and the Caribbean, as well as several new gTLDs. The latter is consistent with the observation that new gTLDs tend to be disproportionately more abused than legacy TLDs, although they tend to have better processes for tackling infringements[8]. Nearly half of the TLDs in this list are operated by just three organisations, namely CentralNic (six TLDs), Freenom (four), and GoDaddy Registry (four) - all consumer-grade registrars.

The Anti-Phishing Working Group's (APWG's) comprehensive Global Phishing Survey[9] of 2017, which analysed the TLDs most frequently associated with phishing domains, also showed some similar trends (although the landscape may have changed somewhat since 2017). Its top 10 TLDs by frequency of phishing domains was dominated by African and Asian country-code TLDs (ccTLDs), with three of the top five (.ml, .bd and .ke) featuring in our top 30 list.

The below observations from the analysis are also notable:

  • Some of the TLDs in the list have special significance:
    • .ly - The frequency of this extension's use in conjunction with threatening content is strongly influenced by its appearance in URL-shortening services (e.g. bit.ly, cutt.ly and ow.ly). This means its threat frequency is disproportionately large compared with what would be expected from its use solely as a ccTLD.
    • .io – The .io extension is popularly used in domains with technology-related content, particularly anything associated with the range of Apple (iOS) operating systems. Many of the threat sites in this analysis are on compromised .io domains, or subdomains of sites such as github.io, rather than reflecting any factors related to the British Indian Ocean Territory.
  • The top 30 highest-threat TLDs includes four of the five free extensions offered by Freenom (with the exception of .tk, where the threat frequency is likely to be diminished by the large absolute number of registrations across the TLD). Their business model allows customers to register domains for free, with the option to make subsequent payments, depending on how the domain will be used. This makes these extensions particularly popular with phishers, who may discard their domains after a few days' use for a phishing attack.

Figure 1 shows how the threat scores compare with the total number of customer domains under CSC's management across the observed TLDs[10].

Figure 1: Total numbers of customer domains under CSC's management (where non-zero) as a function of overall normalised TLD threat frequency, for the top 30 highest threat TLDs

It is notable that most of the highest threat TLDs are associated with only small numbers of domains under CSC’s management. Therefore, one clear recommendation is that brand owners may want to consider defensively registering domain names featuring high-relevance brand terms across the high-risk extensions where possible, to prevent them from being fraudulently registered by third parties.

When exploring a defensive registration strategy, brand owners should also consider registering domains containing specific brand variants or keywords that are frequently associated with phishing activity, rather than just registering exact brand matches across TLDs of particular concern. These might include common character replacements, or keywords like 'login', 'jobs', 'invest' or other industry-related keywords.

Where relevant domains have already been taken across high-threat TLDs, it may be advantageous to monitor them for possible future changes in content, or to launch enforcement actions or acquisition processes in cases where infringing content is identified.

It is also worth considering the list of top TLDs by the number of customer domains under CSC's management (Figure 2). It is noteworthy that only one of the TLDs from the top 30 highest-threat extensions (.cn) currently appears in this list (alongside .com.cn).

Figure 2: Top TLDs by total numbers of registered customer domains under CSC's management

It is often observed that many of the highest-risk TLDs do, however, experience high levels of registration activity overall - with significant proportions associated with fraudulent use - of which much is via consumer-grade registrars, often with little legitimate activity seen by enterprise-class providers. Previous CSC studies have established that most brand-related domain names on risky domain extensions are typically registered by third parties and are often involved in cybersquatting or malicious use. In one study looking at the .icu 'cousins' of the core domains of several top brands - i.e. the same second-level domain name, but on the. icu extension - around three quarters of the domains used suspect DNS providers that were not under the control of the brand owner.

CSC recommendations

CSC has a short list of recommendations to help brand owners tackle the issues outlined in these articles.

1. Start at the foundations

Everything in cybersecurity comes back to the humble domain name. It is vital to have a comprehensive view of your domain portfolio - what domains you have, and which are business-critical, tactical, or defensive. CSC's Domain Management services allow organisations to manage their portfolios of official corporate domains. Deploying blocking or alerting services provides visibility of attempts by third parties to register domains containing brand-related terms. CSC's Brand Advisory Team can consult on domain registration strategies.

2. Keep them secure

Third parties registering branded domains is just part of the issue. Keeping your official domains secure from unauthorised changes to a domain's infrastructure that form the basis for targeted attacks like domain hijacking, e-mail spoofing and phishing is another part of the picture. As an enterprise-class provider, CSC offers several domain security solutions that allow organisations to secure their corporate domains and maintain a defense-in-depth approach as part of a robust security posture. These measures include CSC's MultiLock, DNSSEC (domain name system security extensions), use of CAA (certificate authority authorisation) records, DMARC (domain-based message authentication, reporting, and conformance), SPF (sender policy framework), and DKIM (domain keys identified mail).

3. Monitor closely for potential threats

Domain intelligence is power. Monitoring for the registration, re-registration, and dropping of brand-related domain names is highly recommended, together with using this knowledge to inform when a brand should act. CSC's 3D Domain Security and Enforcement service does just this, encompassing a range of brand variants including fuzzy matches and character replacements. The monitoring covers a wide range of domain extensions, including high-threat TLDs. This service also monitors high-relevance domains 24×7, tracking them for relevant changes in content.

For brands where phishing is a concern, we recommend augmenting domain or Internet content monitoring with a phishing protection service. This will improve coverage over areas that may not otherwise be detected, e.g. non-brand-specific domain names or unindexed Internet content. CSC's phishing detection products use a range of data sources, including spam traps and honeypots, alongside other data feeds such as customer abuse mailbox data and webserver logs. The results are fed into a correlation engine - driven by CSC's machine learning deep search (MLDS) technology - to detect fraudulent sites by analysing URL patterns and comparing the sites with known predictors of fraudulent content.

4. Enforce on infringements

Points 1 to 3 aim to reduce the appearance of cyber-risks, but for existing infringements it is important to have an effective enforcement solution to protect your brand. CSC's Enforcement services include 24×7 rapid takedown of a variety of infringement types. We use a toolkit approach with a wide range of enforcement methodology options, using the most efficient and cost-effective option in any given case, while reserving other options for escalation. Effective use of enforcement enables any brand to protect its reputation, and potentially reclaim lost revenue from fraudulent activity and re-direction to third-party sites.

References

[1] https://www.cscdbs.com/blog/the-highest-threat-tlds-part-1/

[2] https://www.cscdbs.com/en/domainsec/

[3] https://www.spamhaus.org/statistics/tlds/

[4] https://trends.netcraft.com/cybercrime/tlds

[5] https://unit42.paloaltonetworks.com/top-level-domains-cybercrime/

[6] For datasets 1 and 2, all statistics are correct as of 13 June 2022

[7] From iana.org

[8] https://op.europa.eu/en/publication-detail/-/publication/7d16c267-7f1f-11ec-8c40-01aa75ed71a1

[9] https://docs.apwg.org//reports/APWG_Global_Phishing_Report_2015-2016.pdf

[10] Data correct as of June 2022

This article was first published on 16 December 2022 at:

https://www.cscdbs.com/blog/the-highest-threat-tlds-part-2/

Also published at:

https://circleid.com/posts/20230117-the-highest-threat-tlds-part-2

Tuesday, 6 December 2022

Threatening domains targeting the top ten most valuable brands (homoglyph and typo domains)

Key findings and highlights

  • Between August 2021 and August 2022, CSC identified 8,552 unique domain names comprising a very close match to the brand names of the global top ten most valuable companies, with more than 99% of those (where information is available) registered by third parties, leaving those brands vulnerable to targeted attacks by bad actors. Domains definitively determined to be official were excluded from the remainder of the analysis.
  • Where registration information is available, two thirds of the domains used domain privacy services - indicating an intention by the owner to mask their identity - or have redacted information.
  • 56% of the domains still registered at the time of analysis resolved to a live webpage. Among the live sites, we observed a range of high-concern content types, including fraud issues like potential phishing sites, and other brand infringements.
  • 35% of the registered domains were configured with active MX (mail exchange) records, indicating their ability to send and receive e-mails, making them capable of launching phishing attacks.
  • Many of the domain names are chosen to appear deceptively similar to official brand domain names or feature common typo variants to catch misdirected web traffic. Domains using non-Latin characters raise the greatest potential for confusion (and thereby for fraudulent use) as they can appear almost identical to their Latin equivalents, for example:
    • amaƶon.com 
    • ɑpple.com 
    • faceboo.com 
    • gᴑᴑgle.com 
    • mıcrosoft.com

* * *

Methodology of the analysis

In this analysis, we dive into potentially the most egregious and threatening set of domains targeting the top ten most valuable company brands in 2022[1]. These domains are those where the second-level domain name (SLD) - the part of the domain name to the left of the dot - consists solely of an exact or very close match to the targeted brand name.

We used CSC’s 3D Domain Security and Enforcement technology - powered by the DomainSecSM platform, which uses proprietary Machine Learning Deep Search (MLDS) technology and combines machine learning, artificial intelligence, and clustering technology to identify leading indicators of compromise - to conduct the analysis. We considered new registrations (N), re-registrations (R) or drops (D) - collectively referred to as domain registration activity events - over the period August 2021 to August 2022.

The study focuses on domains containing any of the following brand variants as the SLD:

  • Exact matches - where the SLD is identical to the brand term under consideration, but with a different extension (TLD) to the official brand website (i.e. a 'cousin' domain).
  • Homoglyph matches - where one or more characters in the brand term are replaced by a visually similar, non-Latin character.
  • Fuzzy matches - featuring typos (misspellings) affecting a single character, covering missing characters, additional characters, transposed characters, and other character replacements.

Each of these domain names therefore appears extremely similar to that of the brand's official site and raises significant potential for targeted attacks. These variants may have been deliberately selected by those registering the domains to be confusing or to circumvent detection efforts by brand owners, who may be monitoring only for exact matches to the brand string. Such activity presents significant threat vectors to the targeted brands, as those domains can be used for a variety of purposes, including active fraudulent use (e.g. creating phishing sites) or potential brand confusion and traffic misdirection, by taking advantage of the status of and consumer trust in the brand being infringed, or attracting traffic from mis-typed browser requests or search engine queries.

Findings

1. Domain activity

Over the analysis period, CSC identified more than 11,000 domain registration activity events across the ten brands under consideration, focusing only on the very close matches as described in the methodology. As with a previous CSC study considering deceptive domains with names beginning 'www' or 'http'[2], we saw continuous activity across the period (Figure 1).

Figure 1: Daily numbers of new registrations (N), re-registrations (R) and dropped (D) domains with names consisting of a close match to any of the ten brand names under consideration

In some cases, individual spikes in activity can be tied to specific events or coordinated registration campaigns. For example, a batch of 56 domains featuring misspellings of 'mcdonalds.co.uk' was registered on 27-Aug-2021, followed by 40 domains on 01-Sep-2021 which comprised 'microsoft' typos across the .biz and .one extensions, and then 46 'google.fr' and five 'amazon.fr' typo variants on 16-Oct-2021. Additionally, a set of domains featuring 'amazon' (or typos) and 'ten-cent' as the SLD was registered on 22-Apr-2022 across a range of different new gTLD extensions. Previous research by CSC[3] established that a peak in registrations of domain names featuring a key pharmaceutical brand with energy-related keywords took place immediately coinciding with the launch of the Energize programme[4].

In total, we identified 8,552 unique domain names in the dataset. For the active domains where whois information was available, only 72 domains (<1%) were explicitly registered by the official brand owners - presumably as defensive registrations or acquired domains to prevent third-party use. The remaining domains were registered by third parties. The analysis presented here focuses on these 8,480 third-party domains.

2. Brand variant types

Several types of brand variants (listed below) were present in the set of third-party SLDs, with their relative proportions within the dataset shown in Figure 2.

  • Exact match (i.e. 'cousin' domains)
  • Missing character
  • Extra character
  • Transposed characters (i.e. a swapped pair of adjacent characters)
  • Replaced characters - either non-Latin homoglyphs or other character replacements, with the number of replaced characters shown in Figure 2
N.B. Those domains with multiple character substitutions are generally homoglyph (internationalised-character) domain names, since the fuzzy-match searches (covering other character substitutions) explicitly focus on close matches with a single character differing from the search term.

The high potential for confusion between these domain names and those of the official brands' websites poses a significant threat to their security postures, as well as the risk of infringing use by bad actors for phishing activity, or other traffic misdirection.

Figure 2: Frequency of brand variant types used in the dataset of unique domains

Within the dataset, 3% of the domains featured an exact match to the targeted brand name (i.e. cousin domains), 44% featured a missing or extra character, 3% featured transposed characters, and 50% featured one or more replaced characters. In total, just over 3% of the dataset featured homoglyph domains, i.e. those incorporating non-ASCII characters (characters other than the Latin alphabet and other standard characters).

3. Observations on frequently occurring features within the dataset of unique domains

A. Top TLDs

The following are the most popular TLDs represented within the dataset of 8,480 third-party domains.

TLD
                          
No. domains
                          
  .com 856
  .xyz 564
  .net 302
  .top 295
  .shop 260
  .co.uk 236
  .online 221
  .live 205
  .store 205
  .info 185
  .org 184
  .site 184
  .fr 162
  .club 159
  .work 150

As has been seen with previous studies[5,6,7], this list of extensions is dominated by popular TLDs (e.g. .com), and new gTLDs (e.g. .xyz and .top), which are proven to be popular with infringers.

B. Brand variants consisting of transposed characters

Within the dataset of unique domains, the following are the most observed brand variants where a pair of adjacent characters have been swapped. In total, 253 of the domains were found to incorporate transposed characters.

Brand variant
                                  
No. domains
                                  
  googel 29
  appel 25
  goolge 23
  micorsoft 17
  faecbook 15
  amzaon 14
  mircosoft 13
  amazno 13
  gogole 13
  amaozn 11

C. Most popular character replacements

The dataset of unique domains encompassed more than 3,000 individual replaced characters[8].

The top ten most common replacements with Latin alphabet characters were:

Character replacement
                                             
No. instances
                                             
l → i 93
n → m 59
o → i 52
a → o 47
c → k 47
o → a 45
a → e 42
o → g 40
g → d 38
g → b 37

The top ten most common replacements with other characters (non-Latin characters and numbers) were:

Character replacement
                                             
No. instances
                                             
o → 0 78
l → 1 24
o → õ 20
o → ó 17
o → ö 15
l → ł 15
e → 3 13
e → é 12
z → - 10
o → ᴏ 10

In many of these cases, the characters have been replaced with visually similar alternatives to create a convincing lookalike domain name. There are instances, however - notably in the Latin alphabet character replacements - where characters have been replaced with characters that are adjacent to them on a standard QWERTY keyboard (e.g. n  m, o  i, g  b, etc.) presumably to try to catch misdirected traffic from browser requests containing common typing errors. It is also worth noting that some of the common character replacements observed in the dataset may relate to true third-party brand use or unrelated terms (e.g. all the observed g  m replacements in the dataset appear as the term 'moogle', which could refer to a creature in the Final Fantasy gaming series).

D. Top registrant and registrar characteristics

Among the dataset, we found whois (domain registration) information for 4,513 domains, of which two thirds (3,007) used domain privacy services or had redacted registration information. Use of anonymisation services demonstrates a domain owner's attempt to mask their identity and could indicate nefarious intentions[9].

The following tables show the top organisations and countries given in the sets of registration contact details for the domains.

Most common registrant organisations:

Organisation
 
No. domains
                                  
  Domains By Proxy, LLC 752
  Privacy service provided by Withheld for Privacy ehf 401
  See PrivacyGuardian.org 245
  Contact Privacy Inc. Customer 7151571251 153
  Private by Design, LLC 134

Most common registrant countries:

Country
 
No. domains
                                  
  US (United States) 1,809
  CN (China) 624
  IS (Iceland) 447
  CA (Canada) 235
  JP (Japan) 94
  RU (Russia) 91
  DE (Germany) 71
  GB (United Kingdom) 71
  VN (Vietnam) 43
  UA (Ukraine) 41

The following table shows the top registrars through which the domains in the dataset were registered.

Most common registrars:

Registrar
 
No. domains
                                  
  GoDaddy.com, LLC 839
  Namecheap, Inc. 485
  NameSilo, LLC 286
  Dynadot LLC 263
  Alibaba Cloud Computing Ltd. d/b/a HiChina 148
  DNSPod, Inc. 142
  Porkbun LLC 176
  Name.com, Inc. 111
  PDR Ltd. d/b/a PublicDomainRegistry.com 106
  Google LLC 97

The registrar landscape within the dataset is dominated by consumer-grade providers, a trend which has also been previously seen in other CSC studies of domains registered for potentially infringing use[10].

4. Cybersecurity, fraud protection, and brand protection observations

Of the 8,480 unique third-party domains in the dataset, 4,552 (54%) were still registered at the time of analysis (i.e. those domains where the most recent activity event was a registration or re-registration). Just less than a third of the domains - equivalent to 56% of the registered domains - were found to produce a live website response[11]. Furthermore, 1,590 domains (19% of the dataset, or 35% of the registered domains) were configured with active MX records, indicating that they can send or receive e-mails (e.g. for use in a phishing attack). This is possible even when there is no live site content. Additionally, as noted in previous studies[12], dormant domains also have potential to be used fraudulently in the future, so monitoring for changes to configuration status and site content is advisable.

Live sites in the dataset featured a range of content, including: lookalike sites; websites using potentially unauthorised official branding; third-party sites using similar branding; sites featuring content relating to third parties operating in a similar business area as the infringed brand; and gambling-related or adult material. Additionally, many displayed pay-per-click links - as a means of monetising the web traffic - or holding pages offering to sell the domain names, i.e. potential cybersquatting. Many displayed browser warnings indicating threatening content is or has been present.

Figure 3 shows examples of some of the most significant infringements identified within the dataset, with the associated SLD (domain-name string) shown in square brackets in each case. These include websites trying to pass off as the brand in question (as part of a fraudulent brand-impersonation attack, for example), unauthorised use of official intellectual property (IP) - particularly concerning if the content provides an undesirable brand association - or generating revenue for a third party operating in a similar business area through misdirecting web traffic which is arguably intended for the official brand.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 3: Examples of brand infringements identified within the dataset: (a) Lookalike site [googe]; (b) Potential phishing [mc-donalds]; (c) Potential unauthorised use of branding [aamazon]; (d) Similar branding [armazon]; (e) Third-party content in related business area [fgoogle]; (f) Other brand infringement [fakebook]

Across the dataset, 3% of the domains included at least one character outside a standard Latin (ASCII) character set (i.e. domains that can be represented in Punycode[13] format). Many of these are visually almost identical to the official domain name for the brand in question, with no additional, missing or transposed characters, and therefore could be used as highly convincing attack vectors. Around 10% of these were configured with active MX records, indicating e-mail capability.

Examples of highly convincing lookalike domain names (replaced characters highlighted in bold):

  • amazoņ.com
  • amaƶon.com
  • aþþle.com
  • ɑpple.com
  • faceboo.com
  • ƭacebook.com
  • googɫe.com
  • gᴑᴑgle.com
  • mcdonaldƨ.com
  • micrsoft.com
  • mıcrosoft.com

As shown earlier in Figure 2, the majority of domains incorporating character replacements featured a single replaced character. In part, this reflects the fact that the fuzzy searches explicitly focused on matches only one character different. In the dataset, we identified 33 domains where at least half of the characters were replaced by homoglyphs, e.g. gõõgłé.com and fäcëböök.com.

The full list includes a number of examples where all of the character replacements make use of non-Latin versions that look almost identical to the original characters (e.g. ɡᴏoɡle.com), or where the alternative version simply appears visually to be just in a different font (e.g. Fᴀᴄᴇʙᴏᴏᴋ.com). These could form the basis of extremely deceptive infringements.

We found 183 distinct characters (including non-Latin homoglyphs from numerous alphabets and character sets) being used as character replacements across the full dataset, as shown here.

0 1 2 3 4 5 6 7 8 9 - ǀ a ᴀ á à ȧ â ä ǎ ă ā ã å ą ạ ɑ ɒ æ b ʙ ḃ ɓ ḇ ƅ c ᴄ ć ċ ĉ ç ƈ d e ᴇ é è ė ê ë ě ē ę ɇ ẻ ẹ ᶒ ḛ ǝ ə ɘ ɛ f ꜰ ḟ ƒ g ɢ ǵ ġ ĝ ǧ ğ ḡ ģ ɠ ɡ h i ı í ì ǐ ī į ɨ ỉ ị ɩ j k ᴋ ĸ ḱ ķ ƙ ḵ l ĺ ɫ ļ ł ɭ ḻ ꞁ m ṃ n ń ñ ņ ᶇ ṇ o ᴏ ᴑ ó ò ȯ ô ö ŏ ō õ ő ɵ ø ỏ ơ ọ ở œ p ṗ q r ʀ s ƨ ẝ t ť ƭ ṭ ṯ þ u ü v w ƿ ʍ x y z ᴢ ź ž ƶ ʑ ʘ а е і ї р ꓐ ꓑ ꓖ ꓗ ꓜ ꓝ ꓟ ꓠ ꓡ ꓮ ꓰ ꓳ  

Summary

CSC's findings highlight the degree to which trusted brands can be targeted by bad actors, which presents significant threats to their security postures, revenue, and reputations. It highlights the need for domain intelligence as part of a layered security approach, including comprehensive domain monitoring as the foundation of a holistic brand protection initiative.

CSC's 3D Domain Security and Enforcement solution (powered by our DomainSec platform) can provide this overview, enabling brand owners to have visibility of domain activity encompassing detection of brand variants, including fuzzy matches like typos and homoglyph domains, and soundalike variations, across a range of domain name extensions. The system also incorporates MLDS technology to intelligently modify monitoring based on previously identified infringements.

The closeness of the match of a third-party domain name to that of the brand owner's official name is also one key input into algorithms that can help determine the threat level that may be posed in the future by that domain. These concepts are central to the idea of threat scoring, which can be used to prioritise infringements for analysis, future monitoring, and enforcement.

References

[1] https://www.kantar.com/inspiration/brands/what-are-the-most-valuable-global-brands-in-2022; the brand terms used in our analysis are apple; google; amazon; microsoft; tencent; mcdonalds; visa; facebook; alibaba; vuitton.

[2] https://www.cscdbs.com/blog/registration-patterns-of-deceptive-domains/

[3] 'Domain registration patterns analysis' (unpublished)

[4] https://www.se.com/ww/en/about-us/newsroom/news/press-releases/10-global-pharmaceutical-companies-launch-first-of-its-kind-supplier-program-to-advance-climate-action-6182848cf01af478b619ddd4

[5] https://www.cscdbs.com/blog/branded-domains-are-the-focal-point-of-many-phishing-attacks/

[6] https://circleid.com/posts/20210908-credential-hinting-domain-names-a-phishing-lure

[7] https://unit42.paloaltonetworks.com/top-level-domains-cybercrime/

[8] In this analysis, we exclude examples where the replaced-character version forms an alternative word in its own right (e.g. 'apply' or 'ample' for 'apple'), and all replacements for the Visa brand (since the small number of characters means that many of the variations are significantly different from the brand name and may pertain to unrelated third-party use), as the replaced versions in these cases may not be intended to be deceptive variants of the brand name in question.

[9] https://www.cscdbs.com/en/resources-news/supply-chain-report/

[10] https://www.cscdbs.com/en/resources-news/impact-of-covid-on-internet-security/

[11] Those returning an HTTP status code of 200

[12] https://unit42.paloaltonetworks.com/strategically-aged-domain-detection/

[13] https://en.wikipedia.org/wiki/Punycode

This article was first published on 6 December 2022 at:

https://www.cscdbs.com/en/resources-news/threatening-domains-targeting-top-brands/

Unregistered Gems Part 6: Phonemizing strings to find brandable domains

Introduction The UnregisteredGems.com series of articles explores a range of techniques to filter and search through the universe of unregis...