Introduction and methodology
In our previous analysis[1] we considered the use of the mathematical concept of Shannon entropy[2] - essentially a measure of the amount of information (or 'randomness') in a domain name - as a way of clustering together related registrations, as part of the analysis process for identifying and prioritising threatening domains for enforcement and future monitoring.
In this follow up, I extend the analysis, to consider domain registrations with names containing any of the top ten most valuable brands in 2022 (according to Interbrand)[3]. The analysis considers all domains registered in a one-month period (from 28-Dec-2022 to 27-Jan-2023) and focuses only on those domains with names containing an exact match to the brand string, rather than considering typos and other fuzzy-match types. When considering trends and patterns in the dataset, I consider only active domains[4] (i.e. where the most recent activity event is a registration or re-registration). This yields a dataset of 7,714 domains. For simplicity, I again also exclude any domains containing non-Latin characters (11 domains, or 0.14% of the total). The total numbers of results for each of the ten brands are shown in Table 1.
Brand string |
Number of domain activity events[5] in monitoring period |
Number of unique domains represented in dataset |
Number of active domains as of date of analysis |
---|---|---|---|
apple | 5,821 | 5,285 | 2,491 |
microsoft | 707 | 639 | 314 |
amazon | 4,768 | 4,186 | 1,847 |
1,924 | 1,844 | 777 | |
samsung | 608 | 566 | 244 |
toyota | 1,140 | 1,114 | 566 |
coca(-)cola | 65 | 63 | 29 |
mercedes | 617 | 596 | 359 |
disney | 1,032 | 971 | 456 |
nike | 1,595 | 1,506 | 631 |
Table 1: Numbers of domain activity events and unique domains identified during the one-month monitoring period, and the numbers of active domains as of the date of analysis
Findings
Overall, the domains occupy a range of entropy values, from 1.722 (for a second-level domain name (SLD)[6] of gogooglego) to 4.548 (for google-site-verificationuj64c5y9-rkbcpqaxydykjdrj1gop8tzij7nfxu). The set of top 60 names (i.e. those with entropy values of 3.892 or greater) encompasses all domain names within the dataset which appear visually to encompass long, apparently-random character strings (i.e. those which might typically arise in automated bulk registrations), and are listed in the Appendix.
Seven of the top eight comprise very similar domain names (all targeting the Google brand, and with names beginning ‘google-site-verification’), on the .com and .net TLDs (top-level domains, or domain extensions), and are highly likely to represent a coordinated registration campaign, or to have been generated using similar automated registration algorithms. Note that, although all have similar (high) entropy values, the values are not identical (even amongst those with SLDs of the same length), because of the differing numbers of repeated characters in the apparently-random character strings. Although all seven domains are registered using a privacy-protection service, six of the seven appear explicitly to relate to the same entity (with the same 'Contact Privacy Inc.' customer number). These same contact details are actually given for 31 of the top 60 domains in the list, comprising what appears to be a significant cluster of related registrations, all registered via the same registrar (Google LLC), and targeting the Google (6 domains) and Microsoft (25 domains) brands[7].
Other clusters of potentially related domain names are also apparent within the dataset, such as seven domains all with A-records associated with the same IP address, targeting the Amazon (4 domains), Microsoft (2 domains) and Disney (1 domain) brands, and all resolving to webpages monetised through the inclusion of pay-per-click (PPC) links.
The following other observations (all correct as of 27-Jan-2023) from the dataset of top 60 highest-entropy domain names are also of note:
- Only five of the 60 domains resolve to any significant site content. All of these relate to the Amazon brand, and have names featuring multiple keywords (with SLDs such as amazonproductreviewblog and amazonmysteryboxtruckload), rather than strings of apparently-random characters (suggesting deliberate choice of domain name, rather than automated registrations).
- 33 of the domains (55% of the total) display no active website, with the remainder resolving to generally low-threat content, such as pay-per-click (PPC) sites (11 domains; 18%), domain-for-sale pages (2 domains; 3%), or other placeholder pages or pages with no significant content.
- 39 of the domains (65%) are configured with MX records, indicating that they are able to send and receive e-mails, and could be associated with phishing activity, even where no site content is present.
- 58 of the domains (97%) display no whois information, have redacted whois records, or use privacy-protection services. Although this is relatively common since the introduction of GDPR legislation, it can indicate an attempt by the domain owner to conceal their identity, and may be indicative of malicious use[8,9].
- The dataset is dominated by domains registered via consumer-grade registrars (with a top five of: Google LLC (31 domains); GoDaddy.com, LLC (11); Tucows, Inc. (3); Wix.com Ltd. (3); Register.com, Inc. (2)), a trend which has also previously been noted for domains registered for infringing use[10].
- 17 of the domains incorporate long (14 characters or more) apparently random strings of characters (including examples with SLDs such as 2cxqjwitvhtyh0-amazon, googlecb9c4560579f01d3 and microsoftexchange45e6e37e89e2e08e2e. Amongst the remainder of the dataset, there are numerous other domains featuring shorter apparently-random character strings or keyword patterns which may also be indicative of automated and/or bulk registrations (e.g. cloudworkflow-blv14-exec-microsoft365 and cloudworkflow-blv15-exec365-microsoft, or awsnetsuites-mailcloudroam01microsoftechowa and oauthnetsuites-mailcloudroam01promicrosoftechowa).
Conclusion
The findings presented in this study highlight how trusted brands continue to be targeted by infringers, registering brand-specific domain names which may be intended for a range of malicious purposes. The analysis is also suggestive of the fact that these third parties appear to be utilising automated algorithms, to register large numbers of variant domain names - which may incorporate apparently random character strings - in bulk. This behaviour is consistent with the use of multiple short-lived domain names, to create hard-to-detect attacks such as those used in phishing, botnet creation, or other infringements, as has been noted in previous studies of practices such as domain tasting[11,12]. These observations highlight the importance of ongoing, proactive brand- and domain monitoring and enforcement by brand owners.
The analysis also provides a further illustration of how the concept of domain-name entropy can be used as one criterion to cluster together related domains, on the basis of common features in their domain-name structure. For example, domain names registered using automated algorithms which generate long, random or pseudo-random character strings will tend to share similar or identical high entropy values.
Appendix: Top 60 domain names in the dataset, by entropy values
SLD |
TLD |
Brand |
SLD len. (chars) |
Shannon entropy |
---|---|---|---|---|
google-site-verificationuj64c5y9-rkbcpqaxydykjdrj1gop8tzij7nfxu | net | 63 | 4.548 | |
google-site-verification2mxn0odchxhbhbphzzzkmv2rf3cyblumd6wvfbg | com | 63 | 4.529 | |
google-site-verificationdcgyugw3srf3zzz1anas0thyuegawdj2kcxniew | com | 63 | 4.389 | |
google-site-verificationngi9szdebvea6fop2-zalcux1sgrb7ozrdz4ump | com | 63 | 4.342 | |
notallowedscript63cef0c674953googlesyndication | com | 46 | 4.320 | |
google-site-verificationtszjjwyzw7wbjaotv59rpthglhhg4snojfrc4 | com | 61 | 4.310 | |
google-site-verification0mqfjnhkxaxvtftff6psxtgzoqyatc-dszzzjq4 | net | 63 | 4.301 | |
google-site-verificationiz8rx-lrwrfx8qmjryrlebc5e1acsia2ao8rhyd | net | 63 | 4.279 | |
amazon-ec3kh4ayn7a5a8czcza64jhstim448dd6namway746eca9549cjw0eea | com | amazon | 63 | 4.210 |
logworkflow83microsoftexchange365 | com | microsoft | 33 | 4.208 |
workflow83-microsoftexchange365 | com | microsoft | 31 | 4.149 |
mail-workflow83-microsoftexchange365 | com | microsoft | 36 | 4.105 |
servworkflow83microsoftexchangeonline | com | microsoft | 37 | 4.098 |
amz-haifhnakojankertyrewiqhhfoovpidjjjmicrosoft | net | microsoft | 47 | 4.093 |
officeleadershipus02microsoftexchange | com | microsoft | 37 | 4.091 |
microsoftexchange45e6e37e89e2e08e2e | com | microsoft | 35 | 4.086 |
admin-mailcloudcitysend01workflowmicrosoftech | com | microsoft | 45 | 4.084 |
cloudworkflow-blv14-exec10-microsoft365 | com | microsoft | 39 | 4.081 |
metanamegoogle-site-verificationcontenteszeodw7p1vrb4pj | com | 55 | 4.081 | |
fgwlockappsecurekjfdnamazon | com | amazon | 27 | 4.060 |
amazonproductreviewblog | com | amazon | 23 | 4.056 |
servworkflow83microsofttrade365 | com | microsoft | 31 | 4.051 |
log-workflow83-microsoftexchange365 | com | microsoft | 35 | 4.047 |
mail-workflow-microsoftexchange365 | com | microsoft | 34 | 4.023 |
admin-mailregistr2roam01workflowmicrosoftech | com | microsoft | 44 | 4.018 |
cloudworkflows-blv14-exec-microsoft365 | com | microsoft | 38 | 4.015 |
bestamazonproducts2023 | com | amazon | 22 | 4.005 |
amazonkdpselfpublishing | com | amazon | 23 | 4.002 |
cloudworkflow-blv14-exec-microsoft365 | com | microsoft | 37 | 4.000 |
payroll365microsoftdynamics | cloud | microsoft | 27 | 3.986 |
amazonsecurityauth2023 | com | amazon | 22 | 3.971 |
amazonpublishingservice | com | amazon | 23 | 3.969 |
gonetsuites-mailcloudroam01microsoftechpro | com | microsoft | 42 | 3.959 |
officeleadership02-microsoftexchange | com | microsoft | 36 | 3.953 |
bestsellingproductonamazon | com | amazon | 26 | 3.950 |
bestsellingproductonamazon | co | amazon | 26 | 3.950 |
oauthnetsuites-mailcloudroam01promicrosoftechowa | com | microsoft | 48 | 3.949 |
cloudworkflow-blv15-exec365-microsoft | com | microsoft | 37 | 3.946 |
amazonmysteryboxtruckload | com | amazon | 25 | 3.943 |
skynetsuites-mailcloudtect01microsoftechserve | com | microsoft | 45 | 3.942 |
microsoftpurviewday | com | microsoft | 19 | 3.932 |
blueamazonfirestick | com | amazon | 19 | 3.932 |
admin-mailcloudroam01workflowmicrosoftech | com | microsoft | 41 | 3.927 |
metaversedisneyworldsmagickingdom | com | disney | 33 | 3.923 |
amazonkdpselfpublish | com | amazon | 20 | 3.922 |
mailcloudotp-10workflowhostmicrosoftech | com | microsoft | 39 | 3.921 |
rivianamazondeliverytrucks | com | amazon | 26 | 3.921 |
awsnetsuites-mailcloudroam01microsoftechowa | com | microsoft | 43 | 3.919 |
amazonpublishingstore | com | amazon | 21 | 3.916 |
microsoft365emailplus | com | microsoft | 21 | 3.916 |
amazonpublishingworld | com | amazon | 21 | 3.916 |
amz-haifhnakoajdjbfhiswiqhhfoovpidjjjmicrosoft | net | microsoft | 46 | 3.916 |
googlecb9c4560579f01d3 | com | 22 | 3.914 | |
thewbialtdisneycompany | com | disney | 22 | 3.914 |
applegatekitchensandbathrooms | uk | apple | 29 | 3.909 |
applegatekitchensandbathrooms | co.uk | apple | 29 | 3.909 |
partnetsuites-mailcloudroar01microsoftechpros | com | microsoft | 45 | 3.898 |
2cxqjwitvhtyh0-amazon | com | amazon | 21 | 3.897 |
fvnexwopsdqb21-amazon | com | amazon | 21 | 3.897 |
mailweb23-microsofts365 | com | microsoft | 23 | 3.892 |
References
[1] https://www.linkedin.com/pulse/investigating-use-domain-name-entropy-clustering-results-barnett/
[2] https://arxiv.org/ftp/arxiv/papers/1405/1405.2061.pdf
[3] https://interbrand.com/best-global-brands-2022-download-form/; the brand strings considered are: 'apple', 'microsoft', 'amazon', 'google', 'samsung', 'toyota', 'coca(-)cola', 'mercedes', 'disney', 'nike'
[4] As of the date of analysis (27-Jan-2023)
[5] A domain activity event is defined as an instance of a registration, re-registration, or domain drop (lapse)
[6] The SLD is the part of the domain name before the dot
[7] Note that, in general, even if two domain names targeting two different brands utilised identical styles of apparently-random character strings, we would not necessarily expect the domains to have identical entropy values, because the lengths and structures of the brand strings themselves may differ
[8] https://www.cscdbs.com/en/resources-news/supply-chain-report/
[9] https://www.cscdbs.com/en/resources-news/threatening-domains-targeting-top-brands/
[10] https://www.cscdbs.com/en/resources-news/impact-of-covid-on-internet-security/
[11] https://www.cscdbs.com/blog/patterns-and-trends-in-domain-tasting-of-the-top-10-global-brands/
[12] https://www.linkedin.com/pulse/patterns-trends-domain-tasting-top-ten-global-brands-david-barnett/
This article was first published on 2 February 2023 at:
https://www.linkedin.com/pulse/entropy-analysis-registered-domain-names-relating-top-david-barnett/
No comments:
Post a Comment