Thursday, 2 February 2023

Entropy analysis of registered domain names relating to the top ten most valuable brands

Introduction and methodology

In our previous analysis[1] we considered the use of the mathematical concept of Shannon entropy[2] - essentially a measure of the amount of information (or 'randomness') in a domain name - as a way of clustering together related registrations, as part of the analysis process for identifying and prioritising threatening domains for enforcement and future monitoring. 

In this follow up, I extend the analysis, to consider domain registrations with names containing any of the top ten most valuable brands in 2022 (according to Interbrand)[3]. The analysis considers all domains registered in a one-month period (from 28-Dec-2022 to 27-Jan-2023) and focuses only on those domains with names containing an exact match to the brand string, rather than considering typos and other fuzzy-match types. When considering trends and patterns in the dataset, I consider only active domains[4] (i.e. where the most recent activity event is a registration or re-registration). This yields a dataset of 7,714 domains. For simplicity, I again also exclude any domains containing non-Latin characters (11 domains, or 0.14% of the total). The total numbers of results for each of the ten brands are shown in Table 1.

Brand string
                                
Number of domain
activity events[5] in
monitoring period
                                         
Number of unique
domains represented
in dataset
                                         
Number of active
domains as of
date of analysis
                                         
  apple 5,821 5,285 2,491
  microsoft 707 639 314
  amazon 4,768 4,186 1,847
  google 1,924 1,844 777
  samsung 608 566 244
  toyota 1,140 1,114 566
  coca(-)cola 65 63 29
  mercedes 617 596 359
  disney 1,032 971 456
  nike 1,595 1,506 631

Table 1: Numbers of domain activity events and unique domains identified during the one-month monitoring period, and the numbers of active domains as of the date of analysis

Findings

Overall, the domains occupy a range of entropy values, from 1.722 (for a second-level domain name (SLD)[6] of gogooglego) to 4.548 (for google-site-verificationuj64c5y9-rkbcpqaxydykjdrj1gop8tzij7nfxu). The set of top 60 names (i.e. those with entropy values of 3.892 or greater) encompasses all domain names within the dataset which appear visually to encompass long, apparently-random character strings (i.e. those which might typically arise in automated bulk registrations), and are listed in the Appendix. 

Seven of the top eight comprise very similar domain names (all targeting the Google brand, and with names beginning ‘google-site-verification’), on the .com and .net TLDs (top-level domains, or domain extensions), and are highly likely to represent a coordinated registration campaign, or to have been generated using similar automated registration algorithms. Note that, although all have similar (high) entropy values, the values are not identical (even amongst those with SLDs of the same length), because of the differing numbers of repeated characters in the apparently-random character strings. Although all seven domains are registered using a privacy-protection service, six of the seven appear explicitly to relate to the same entity (with the same 'Contact Privacy Inc.' customer number). These same contact details are actually given for 31 of the top 60 domains in the list, comprising what appears to be a significant cluster of related registrations, all registered via the same registrar (Google LLC), and targeting the Google (6 domains) and Microsoft (25 domains) brands[7]

Other clusters of potentially related domain names are also apparent within the dataset, such as seven domains all with A-records associated with the same IP address, targeting the Amazon (4 domains), Microsoft (2 domains) and Disney (1 domain) brands, and all resolving to webpages monetised through the inclusion of pay-per-click (PPC) links.

The following other observations (all correct as of 27-Jan-2023) from the dataset of top 60 highest-entropy domain names are also of note:

  • Only five of the 60 domains resolve to any significant site content. All of these relate to the Amazon brand, and have names featuring multiple keywords (with SLDs such as amazonproductreviewblog and amazonmysteryboxtruckload), rather than strings of apparently-random characters (suggesting deliberate choice of domain name, rather than automated registrations).
  • 33 of the domains (55% of the total) display no active website, with the remainder resolving to generally low-threat content, such as pay-per-click (PPC) sites (11 domains; 18%), domain-for-sale pages (2 domains; 3%), or other placeholder pages or pages with no significant content. 
  • 39 of the domains (65%) are configured with MX records, indicating that they are able to send and receive e-mails, and could be associated with phishing activity, even where no site content is present.
  • 58 of the domains (97%) display no whois information, have redacted whois records, or use privacy-protection services. Although this is relatively common since the introduction of GDPR legislation, it can indicate an attempt by the domain owner to conceal their identity, and may be indicative of malicious use[8,9].
  • The dataset is dominated by domains registered via consumer-grade registrars (with a top five of: Google LLC (31 domains); GoDaddy.com, LLC (11); Tucows, Inc. (3); Wix.com Ltd. (3); Register.com, Inc. (2)), a trend which has also previously been noted for domains registered for infringing use[10]
  • 17 of the domains incorporate long (14 characters or more) apparently random strings of characters (including examples with SLDs such as 2cxqjwitvhtyh0-amazon, googlecb9c4560579f01d3 and microsoftexchange45e6e37e89e2e08e2e. Amongst the remainder of the dataset, there are numerous other domains featuring shorter apparently-random character strings or keyword patterns which may also be indicative of automated and/or bulk registrations (e.g. cloudworkflow-blv14-exec-microsoft365 and cloudworkflow-blv15-exec365-microsoft, or awsnetsuites-mailcloudroam01microsoftechowa and oauthnetsuites-mailcloudroam01promicrosoftechowa). 

Conclusion

The findings presented in this study highlight how trusted brands continue to be targeted by infringers, registering brand-specific domain names which may be intended for a range of malicious purposes. The analysis is also suggestive of the fact that these third parties appear to be utilising automated algorithms, to register large numbers of variant domain names - which may incorporate apparently random character strings - in bulk. This behaviour is consistent with the use of multiple short-lived domain names, to create hard-to-detect attacks such as those used in phishing, botnet creation, or other infringements, as has been noted in previous studies of practices such as domain tasting[11,12]. These observations highlight the importance of ongoing, proactive brand- and domain monitoring and enforcement by brand owners.

The analysis also provides a further illustration of how the concept of domain-name entropy can be used as one criterion to cluster together related domains, on the basis of common features in their domain-name structure. For example, domain names registered using automated algorithms which generate long, random or pseudo-random character strings will tend to share similar or identical high entropy values. 

Appendix: Top 60 domain names in the dataset, by entropy values

SLD
 
TLD
                    
Brand
                    
SLD len.
(chars)
                    
Shannon
entropy
                    
  google-site-verificationuj64c5y9-rkbcpqaxydykjdrj1gop8tzij7nfxu   net   google 63 4.548
  google-site-verification2mxn0odchxhbhbphzzzkmv2rf3cyblumd6wvfbg   com   google 63 4.529
  google-site-verificationdcgyugw3srf3zzz1anas0thyuegawdj2kcxniew   com   google 63 4.389
  google-site-verificationngi9szdebvea6fop2-zalcux1sgrb7ozrdz4ump   com   google 63 4.342
  notallowedscript63cef0c674953googlesyndication   com   google 46 4.320
  google-site-verificationtszjjwyzw7wbjaotv59rpthglhhg4snojfrc4   com   google 61 4.310
  google-site-verification0mqfjnhkxaxvtftff6psxtgzoqyatc-dszzzjq4   net   google 63 4.301
  google-site-verificationiz8rx-lrwrfx8qmjryrlebc5e1acsia2ao8rhyd   net   google 63 4.279
  amazon-ec3kh4ayn7a5a8czcza64jhstim448dd6namway746eca9549cjw0eea   com   amazon 63 4.210
  logworkflow83microsoftexchange365   com   microsoft 33 4.208
  workflow83-microsoftexchange365   com   microsoft 31 4.149
  mail-workflow83-microsoftexchange365   com   microsoft 36 4.105
  servworkflow83microsoftexchangeonline   com   microsoft 37 4.098
  amz-haifhnakojankertyrewiqhhfoovpidjjjmicrosoft   net   microsoft 47 4.093
  officeleadershipus02microsoftexchange   com   microsoft 37 4.091
  microsoftexchange45e6e37e89e2e08e2e   com   microsoft 35 4.086
  admin-mailcloudcitysend01workflowmicrosoftech   com   microsoft 45 4.084
  cloudworkflow-blv14-exec10-microsoft365   com   microsoft 39 4.081
  metanamegoogle-site-verificationcontenteszeodw7p1vrb4pj   com   google 55 4.081
  fgwlockappsecurekjfdnamazon   com   amazon 27 4.060
  amazonproductreviewblog   com   amazon 23 4.056
  servworkflow83microsofttrade365   com   microsoft 31 4.051
  log-workflow83-microsoftexchange365   com   microsoft 35 4.047
  mail-workflow-microsoftexchange365   com   microsoft 34 4.023
  admin-mailregistr2roam01workflowmicrosoftech   com   microsoft 44 4.018
  cloudworkflows-blv14-exec-microsoft365   com   microsoft 38 4.015
  bestamazonproducts2023   com   amazon 22 4.005
  amazonkdpselfpublishing   com   amazon 23 4.002
  cloudworkflow-blv14-exec-microsoft365   com   microsoft 37 4.000
  payroll365microsoftdynamics   cloud   microsoft 27 3.986
  amazonsecurityauth2023   com   amazon 22 3.971
  amazonpublishingservice   com   amazon 23 3.969
  gonetsuites-mailcloudroam01microsoftechpro   com   microsoft 42 3.959
  officeleadership02-microsoftexchange   com   microsoft 36 3.953
  bestsellingproductonamazon   com   amazon 26 3.950
  bestsellingproductonamazon   co   amazon 26 3.950
  oauthnetsuites-mailcloudroam01promicrosoftechowa   com   microsoft 48 3.949
  cloudworkflow-blv15-exec365-microsoft   com   microsoft 37 3.946
  amazonmysteryboxtruckload   com   amazon 25 3.943
  skynetsuites-mailcloudtect01microsoftechserve   com   microsoft 45 3.942
  microsoftpurviewday   com   microsoft 19 3.932
  blueamazonfirestick   com   amazon 19 3.932
  admin-mailcloudroam01workflowmicrosoftech   com   microsoft 41 3.927
  metaversedisneyworldsmagickingdom   com   disney 33 3.923
  amazonkdpselfpublish   com   amazon 20 3.922
  mailcloudotp-10workflowhostmicrosoftech   com   microsoft 39 3.921
  rivianamazondeliverytrucks   com   amazon 26 3.921
  awsnetsuites-mailcloudroam01microsoftechowa   com   microsoft 43 3.919
  amazonpublishingstore   com   amazon 21 3.916
  microsoft365emailplus   com   microsoft 21 3.916
  amazonpublishingworld   com   amazon 21 3.916
  amz-haifhnakoajdjbfhiswiqhhfoovpidjjjmicrosoft   net   microsoft 46 3.916
  googlecb9c4560579f01d3   com   google 22 3.914
  thewbialtdisneycompany   com   disney 22 3.914
  applegatekitchensandbathrooms   uk   apple 29 3.909
  applegatekitchensandbathrooms   co.uk   apple 29 3.909
  partnetsuites-mailcloudroar01microsoftechpros   com   microsoft 45 3.898
  2cxqjwitvhtyh0-amazon   com   amazon 21 3.897
  fvnexwopsdqb21-amazon   com   amazon 21 3.897
  mailweb23-microsofts365   com   microsoft 23 3.892

References

[1] https://www.linkedin.com/pulse/investigating-use-domain-name-entropy-clustering-results-barnett/

[2] https://arxiv.org/ftp/arxiv/papers/1405/1405.2061.pdf

[3] https://interbrand.com/best-global-brands-2022-download-form/; the brand strings considered are: 'apple', 'microsoft', 'amazon', 'google', 'samsung', 'toyota', 'coca(-)cola', 'mercedes', 'disney', 'nike'

[4] As of the date of analysis (27-Jan-2023)

[5] A domain activity event is defined as an instance of a registration, re-registration, or domain drop (lapse)

[6] The SLD is the part of the domain name before the dot

[7] Note that, in general, even if two domain names targeting two different brands utilised identical styles of apparently-random character strings, we would not necessarily expect the domains to have identical entropy values, because the lengths and structures of the brand strings themselves may differ

[8] https://www.cscdbs.com/en/resources-news/supply-chain-report/

[9] https://www.cscdbs.com/en/resources-news/threatening-domains-targeting-top-brands/

[10] https://www.cscdbs.com/en/resources-news/impact-of-covid-on-internet-security/

[11] https://www.cscdbs.com/blog/patterns-and-trends-in-domain-tasting-of-the-top-10-global-brands/

[12] https://www.linkedin.com/pulse/patterns-trends-domain-tasting-top-ten-global-brands-david-barnett/

This article was first published on 2 February 2023 at:

https://www.linkedin.com/pulse/entropy-analysis-registered-domain-names-relating-top-david-barnett/

No comments:

Post a Comment

Phishing trends 2024 - and a look at some new data for domain threat quantification

Overview This year's annual phishing report by Internet technology consultants Interisle [1] has provided a number of key insights into...