Monday, 11 September 2023

The randomest domain names: entropy as an indicator of TLD threat level

by David Barnett and Richard Ferguson

Introduction

Domain registrations and abuse have had something of a renaissance in recent years, with increases in the numbers of people working from home and shopping online giving rise to countless opportunities for scammers. However, with almost 1,600[1] different top-level domains (TLDs, or domain extensions) to choose from, it can be difficult for brand owners to identify which TLDs to register across - indeed, the annual cost of owning a domain portfolio can soon spiral. Beyond the simple consideration of which TLDs are the 'best fit' for a brand's area of interest based on name alone (e.g. .shop for an online retailer), a statistical analysis of the most extensively abused TLDs can also provide further insights.

This post analyses a wide set of TLDs to assess whether patterns in the length and randomness of domain names shows any correlation with other independent estimates of the level of threat associated with different domain extensions.

Primer

The universe of registered domains includes large numbers in which the domain name consists just of long, apparently random strings of characters. Several previous studies have suggested that these types of domains are often associated with fraudulent or malicious activity, such as phishing (where the domains can be used in the generation of deceptive URLs) or the distribution of malware. In many cases, these domain names arise using automated domain name generation algorithms and associated automatic registrations, by bad actors[2,3].

The existence of domains potentially set up for underhand purposes can be analysed through consideration of a parameter known as Shannon entropy, which provides a measure of the amount of information stored in a string of characters - broadly, long domain names, and/or those containing large numbers of distinct characters (such as the random domain names discussed here), will have high entropy[4].

The entropy of domains differs between TLDs, with some showing a markedly greater frequency of long, random domain names than others. For example, in a previous blog post[5], we discussed how the set of new .zip domains contains many more high-entropy (long, random) names than other TLDs. All other factors being equal, this might suggest that TLDs such as .zip are more prone to abuse by online bad actors.

Analysis

For the study, we consider the set of domain zone files published by ICANN[6], which covers gTLDs (.com, .net, etc.) and new-gTLDs (.top, .xyz, .online, etc.). In total, the dataset covers approximately 1,050 TLDs. For each TLD, the mean domain name entropy value, across all domains registered with that extension, is calculated (noting that small TLDs - where fewer than 100 domains are registered - have been excluded from the analysis, as the results are deemed to be of lower significance; this leaves a dataset of 576 TLDs). The results are shown in Table 1 and Figures 1 and 2.

TLD
                                       
Mean entropy
 
N
                        
  bayern 3.578820 60,318  
  crs 3.556059 1,144  
  man 3.548192 361  
  nrw 3.543092 36,313  
  xn--mgbca7dzdo 3.533396 117  
  gov 3.524858 19,542  
  goog 3.470524 543  
  med 3.461878 69,735  
  page 3.461800 102,978  
  eus 3.444771 27,950  
  mov 3.419044 6,724  
  esq 3.417947 3,565  
  amsterdam 3.416103 41,989  
  rsvp 3.415646 4,572  
  channel 3.408561 631  
  swiss 3.404208 37,801  
  dev 3.396982 769,971  
  app 3.394302 1,274,223  
  abudhabi 3.390945 2,060  
  zip 3.389665 30,223  
  google 3.380865 318  
  top 3.362711 4,512,204  
  komatsu 3.359931 133  
  day 3.353672 20,345  
  kyoto 3.326108 2,042  
  nexus 3.323493 2,250  
  how 3.320968 7,987  
  radio 3.319183 5,793  
  soy 3.317902 3,467  
  phd 3.312976 2,793  

Table 1: Top 30 TLDs with greatest mean domain name entropy (N = no. of domains in dataset)

Figure 1: Top 30 TLDs with greatest mean domain name entropy

Figure 2: Bottom 30 TLDs by mean domain name entropy

The highest-entropy TLDs can indeed be seen through visual inspection to contain disproportionately high numbers of long, random domain names, with significant numbers of 32-character examples (Figure 3). The reason for this exact number (compared with the absolute maximum possible number for a SLD[7] of 63 characters) is not clear; however, it was the greatest length historically considered to be 'good practice'[8] for a domain name and can (depending on usage and provider) be a value beyond which functionality limitations may apply. The value may also be related to the type of algorithm(s) used to automatically generate the domain names, or the functionality available through the registrars utilised.

The alphabetical list of .bayern domains (the highest-entropy TLD in the dataset), for example, begins:

000.bayern
0008cp8d8h7jgqmddh0kciot4gousac0.bayern
002s0ldfq8l8uo0qr63fbtnjirgc2058.bayern
003v242nno6b91ppgtfr54rc820dvkqu.bayern
0057tcga35h7en9cro4vtbqr2sual0ju.bayern
0070fq4boldtihbvangusggq5r4jc8u7.bayern
0077bcqmb64p5odoa0pfhedmuv8nrdo9.bayern
007dqkp5jvh8qn7b8m5i3tlrgcm3t5cl.bayern
007dv5edpr3rgpam4lnlq6v6147hdbub.bayern
0081mlfvlec3qj5m508633l9sjvbsiph.bayern
00846bmbh82ovq0n1kr78jc97c3dhh7e.bayern
009a705ptm7dfi1uk37kfmkp5dqec1lo.bayern
00a71os7ja4mrjcg32hvs4tcgephthpr.bayern
00amv24rasudpcoj4ddniqujf4qd00ha.bayern
00b8jv3gs972inad2cipm20gqvohmn0v.bayern
00bu3lvu54afr3egplojrpamqu4onhck.bayern
00clcm817v8sra5aqpcru0u8t5lrcjti.bayern
00dfkkjfmhpqll6ladjs3tqlpaqhuijc.bayern
00espnkvp4ohdq7dm35o7v4po4rpm4bp.bayern
00f2n0s19mqn3s34ij3rpnju85arfth8.bayern

Figure 3: Numbers of .bayern domains, by domain name (SLD) length

It is also instructive to compare the mean entropy for each TLD with previous estimates of the general level of risk associated with that TLD, considering factors such as the frequency of their use in phishing, spam, and malware. In one such study[9], TLDs were allocated a normalised 'threat frequency' score (between 0 and 1), based on threat statistics taken from a range of independent datasets. Figure 4 shows a comparison between the mean entropy of the domains for each TLD, and the threat score from this previous study, for all TLDs present in both datasets.

Figure 4: Comparison between mean domain name entropy (this study) and normalised threat frequency score (previous study) for each TLD

Whilst there is no strong correlation between the two datasets (though there is a weak positive correlation, with a coefficient of +0.07), there is a suggestion that the highest-entropy TLDs (those with a mean entropy value of > 3.2) do tend to sit at the higher end of the risk spectrum (threat score > approx. 0.2). This is at least suggestive of some self-consistency in terms of the assertion that higher-entropy domain names (and the TLDs with which they are more frequently associated) tend to be more likely to be linked to a range of classes of fraudulent and malicious activity.

Conclusions

Previous research suggests that long, random (high entropy) domain names are more likely to associated with automated algorithmic registrations, and to be used for malicious activity. It is also noteworthy that many of the most suspicious domain names are (exactly) 32 characters in length.

Certain domain extensions are associated with greater proportions of high entropy domains, and the top 30 TLDs (by mean entropy) includes a number of popular extensions like .top (4.5m domains), .app (1.3m) and .page (103k). The additional finding that many of these same TLDs are generally found more frequently to be associated with phishing, spam, and malware is suggestive of a correspondence between mean domain entropy and overall level of risk for a particular TLD.

Quantitative studies such as this can help inform and validate brand protection strategies, especially when overlaid with qualitative analysis (such as consideration of what string the domain extension itself actually is, in terms of a keyword or description). This assessment provides guidance not just on which domains to register, but also which domain extensions warrant attention when monitoring, and prioritisation when enforcing.  The Internet isn’t getting any smaller, but combining metrics can help with zoning in on targets.

References

[1] https://www.iana.org/domains/root/db

[2] https://circleid.com/posts/20230703-an-overview-of-the-concept-and-use-of-domain-name-entropy

[3] https://www.splunk.com/en_us/blog/security/random-words-on-entropy-and-dns.html

[4] https://www.linkedin.com/pulse/investigating-use-domain-name-entropy-clustering-results-barnett/

[5] https://www.iamstobbs.com/opinion/un-.zip-ping-and-un-.box-ing-the-risks-associated-with-new-tlds

[6] https://czds.icann.org/home

[7] The SLD (second-level domain name) is the part of the domain name before the dot

[8] https://docs.oracle.com/cd/E19683-01/806-4077/6jd6blbdi/index.html

[9] https://circleid.com/posts/20230117-the-highest-threat-tlds-part-2

This article was first published on 11 September 2023 at:

https://www.iamstobbs.com/opinion/the-randomest-domain-names-entropy-as-an-indicator-of-tld-threat-level

Thursday, 24 August 2023

Un-.zip-ping and un-.box-ing the risks associated with new TLDs

Introduction

A few weeks on from the launch of the .zip domain extension (an example of a 'top-level domain', or TLD), and just as the .box TLD is set to launch, we consider the cybersecurity and infringement risks presented by the new registrations.

.zip is one of the most recent in a long line of new TLDs launched since the start of the new-gTLD programme in 2012[1], entering its General Availability phase (in which domain registrations are open to all) on 10-May-2023[2].

The reason for concern with this particular extension is the potential for confusion with a digital file suffix commonly used for compressed or archive data files ('zip files') and the possibility that this confusion may be exploited by bad actors to drive Internet users to their own content, distribute malware, and/or create brand infringements.

These types of abuse can be manifested in a range of different ways:

  • Many platforms and e-mail clients will automatically convert certain types of string into URLs, so a legitimate filename such as document.zip could be interpreted as a URL which, when clicked, may drive users to the corresponding domain name if registered[3,4]. Similarly, if a user searches for a non-existent zip-file name, file explorer applications may instead perform an online search directing the user to a corresponding .zip domain name.
  • The DNS queries associated with a link-click can provide information to the site owner on the name of the file being requested, which could correspondingly result in a leakage of sensitive information[5]. This may be particularly effective if the second-level name ('SLD') of the registered domain (i.e. the part of the domain name before the dot) is a file extension (such as .doc) in its own right - e.g. a domain such as doc.zip might allow the site owner to see that a file such as sensitivedocumentname.doc.zip has been requested.
  • The TLD presents the possibility for a link to a potentially malicious .zip domain to easily be disguised as a link to a zip file on a trusted website[6], or as content embedded in a malicious e-mail.
  • Domains hosted on the .zip TLD may be more likely to be trusted by users based on their familiarity with regular zip files.
  • Conversely, as the .zip extension becomes more well-known, users may unknowingly download a zip file - which can contain arbitrary content of unknown legitimacy - thinking that they are simply clicking on a link to a regular website[7].

Domains on the .zip extension are being offered by Google Domains[8], together with a number of others - including .mov, which launched on the same day, and is subject to similar security concerns due to the possibility of confusion with the video-file format suffix. Despite the claim that the domain extension is intended to represent content from providers who are "fast, efficient, and ready to move", the risks - combined with other Google offerings which are attractive to would-be attackers, such as a whois privacy service and subdomain forwarding - mean that the domains on this new TLD may warrant careful scrutiny.

In a similar vein, the .box domain extension is set to enter its Sunrise phase - where brand owners can apply for new domains, prior to General Availability - on 09-Aug-2023[9]. Whilst not a file suffix in the same way as .zip, the .box extension is also likely to be subject to abuse, in part due to the possible scope for confusion with content relating to the Dropbox hosting and file-sharing service. Other brand names incorporating the term 'box' (such as Xbox and Birchbox) may also find themselves particularly targeted by attacks, and we anticipate that this additional new TLD may also be worth closely watching once general registrations commence.

.zip registrations in the first two months of activity

The .zip extension has seen a rapid growth in the numbers of registrations in the weeks since its launch - in part, presumably, due to its attractiveness to bad actors. Within the first month, it was already the most popular of Google’s eight new registration offerings by a significant margin[10]. However, it is worth noting that some of the registered domains feature warnings of the potential for abuse, or have been registered so as to block use by bad actors.

In this article, we use DNS zone-file information to conduct a comprehensive study of registered domains across the TLD, to analyse potential indicators of intention for nefarious use. This work follows on from previous studies, which already found five active phishing sites - targeting the Microsoft, Google, and Okta brands - within a week of launch of the TLD[11] and numerous other domains featuring keywords (such as 'install' or 'update', other brand-related terms, or long, non-sensical strings) of concern, due to the potential of their association with filenames or downloadable tools, and/or the corresponding phishing and malware risks.

As of 21-Jul-2023, there were 29,664 distinct .zip domains registered. 266 of these comprised just a string which is also used as a filename suffix[12] as the SLD, with the following common examples all found to have been registered: apk, css, doc, docx, exe, htm, html, gz, jpeg, jpg, mov, mp3, mp4, php, ppt, pptx, rar, sql, tar, tmp, wav, xls, xlsx, xml, and zip itself (as apk.zip, css.zip, etc.).

The following statistics illustrate the numbers of domains with SLDs featuring keywords of particular interest or concern:

  • 359 domains feature the term 'file', 280 'update', 170 'install', 112 'download', and 53 'invoice'.
  • The top four most valuable global brands in 2023[13] are all technology brands, and therefore compelling candidates for infringements using the .zip extension. Of these, 'apple' features in 12 domains, 'google' in 49, 'microsoft' in 49, and 'amazon' in 7. Other related product names also feature in the dataset, with 82 'windows' domains and 31 'chrome'.

Overall, this yields a dataset of 1,093 domains (3.7% of the total) containing one or more of the above high-risk keywords. Of these, 415 (38.0%) return an HTTP status code of 200 (i.e. some sort of live website response). Some of these provide a relatively light-hearted proof-of-concept illustration of the risk of misdirection, with twenty-three (including archivedfile[.]zip, chrome-browser[.]zip, emergencyupdate[.]zip, and important-files[.]zip) re-directing to videos of Rick Astley's 'Never Gonna Give You Up' - the Internet practice known as 'Rickrolling'[14] - although a number of more concerning examples were identified, such as those outlined below, each of which has the potential to be distributing malicious content:

  • Figure 1(i): Microsoft-related domain name resolving to a website displaying a 'file explorer'-style page referencing downloadable files
  • Figure 1(ii): Website which automatically downloads an archive file named quarterly_figures_q2_2023.invoicestuff.zip
  • Figure 1(ii): Website purportedly offering the download of a number of software applications

(i)

(ii)

(iii)

Figure 1: Examples of live websites with content of potential concern hosted on .zip domain names

Altogether, 38 of the domains in the dataset of 1,093 high-risk domains included the keyword 'login' at some location within their HTML (site content), indicating possible use for phishing activity.

Other examples of domains re-directing to apparently-unrelated third-party sites were also identified - these may be taking advantage of misdirection tactics, even if not explicitly malicious.

However, very few of the domains appear to have been registered by official brand owners for legitimate use or to protect customers, with just four re-directing to URLs on the microsoft.com site, two on google.com, one on office.com, one on ubuntu.com, one on malwarebytes.com, one on archive.org, and one on square-enix.com.

Another key observation is the fact that the dataset of all .zip domains contains disproportionately many names consisting of long, apparently non-sensical strings of characters, compared with the general domain population. These types of domains have been noted previously as commonly being associated with phishing activity, through such tactics as the construction of deceptive URLs. The observation can be shown quantitatively by calculating the distribution of domain-name entropy values ('Shannon entropy', a method of quantifying the amount of randomness, or unpredictability, of a SLD string) within the .zip dataset, compared with the distribution amongst a set of all domain name registrations from a particular day, from a previous study[15] (Figure 2).

Figure 2: Distribution of domain-name entropy values for the dataset of .zip domains (red), compared with a set of general domains from a previous study (blue)

This analysis shows that the .zip domain distribution is significantly more weighted towards the high-entropy end of the spectrum (with a second peak at values above 4, and an average entropy value across the whole dataset of 3.39), compared with the domains from the general dataset (average entropy = 2.86).

Within the set of .zip domains, virtually all of the domains with entropy values about 3.85 (14,659 domains, or 49.4% of the total) consist visually of apparently-random strings (see Table 1).

Domain name
 
Entropy value
                            
  g0kfctpdb18t7vkidqj2me5ls9rjo46g.zip 4.6875
  r5s0mo4tl315achnpvrkie76j84unba2.zip 4.6875
  abcdefghijklmnopqrstuvwxy.zip 4.6439
  98lgdq7c064nmbs1olvuejsnvhbt82ri.zip 4.6250
  cph1ukfm2n1bvd8jsaqetc3o47a7lfq6.zip 4.6250
  cr9qpcoiaklt1f53m6bj0u07r3eud2k4.zip 4.6250
  g4umroti85bj0vfes01d3oqau2n74fpj.zip 4.6250
  hj23qhtvgcsd4pqcs765r8meuf014dba.zip 4.6250
  ke6h76jnpefh2s2aivau98mc453ogtb7.zip 4.6250
  l5eujm8vksnetqd1714fm2o3a3hgrpkd.zip 4.6250
  mlf7v0nmbhia9rgil68jsp15qk2s0ech.zip 4.6250
  piuvk9qg4indoljemab245fks3cn075b.zip 4.6250
  1cd7as0m8kpv1l0j5tnfqih2ot5tqge3.zip 4.6014
  3uav01gor6482mj2t6k9bp50ofkl7qio.zip 4.6014
  9q7f61obtugmpn8tj0i3r1bcmahsk5ft.zip 4.6014
  apnv6golm5r3kp4f3jst744qbuh218n6.zip 4.6014
  lms1acrubko51qqht7lf94138v0i0ndh.zip 4.6014
  obdpfj3t963u7rltac095lmp1hi3g82q.zip 4.6014
  so5eip1av0krpe3pthq7dnngd3bumfcl.zip 4.6014
  to7liok38ijgud5hchs0rvmtiab9e2fe.zip 4.6014

Table 1: Top 20 .zip domains by entropy values

None of the above domains was found to resolve to any live content as of the time of analysis (24-Jul-2023).

Conclusions

By the nature of its potential confusion with a filename suffix, the .zip TLD presents significant risk for both brand owners and Internet users, in terms of the possibility for brand infringements and potential association with phishing activity and malware distribution - and the risk for brand damage which this entails. Already, the registration patterns across this domain extension are indicative that the TLD is likely to be popular with bad actors, by virtue of the keywords and domain-name structures observed in the current dataset, together with the presence of live content of concern in some cases. We also anticipate that the .box domain extension, set to see its initial launch on 09-Aug, may also transpire to be subject to similar types of abuse.

These observations highlight the importance of brand owners taking a proactive approach to monitoring and enforcement with domains, allowing timely detection of - and action against - threatening registrations, through a programme of brand protection which is able to tackle new TLDs as soon as they launch, and identify new domain registrations on a daily basis.

References

[1] https://newgtlds.icann.org/en/program-status/delegated-strings

[2] https://tld-list.com/launch-schedule

[3] https://circleid.com/posts/20230517-new-google-domains-spark-cybersecurity-concerns-risks-and-reactions-to-.zip-and-.mov-top-level-domains

[4] https://tech.slashdot.org/story/23/05/19/1228215/google-pushes-new-domains-onto-the-internet-and-the-internet-pushes-back

[5] https://blog.talosintelligence.com/zip-tld-information-leak/

[6] https://medium.com/@bobbyrsec/the-dangers-of-googles-zip-tld-5e1e675e59a5

[7] https://www.iptwins.com/en/2023/05/25/domain-names-in-zip-beware-of-security-threats/

[8] https://domains.google/tld/zip/

[9] https://newgtlds.icann.org/en/program-status/sunrise-claims-periods

[10] https://blog.talosintelligence.com/zip-tld-information-leak/

[11] https://www.netcraft.com/blog/phishing-attacks-already-using-the-zip-tld/

[12] https://gist.github.com/securifera/e7eed730cbe1ce43d0c29d7cd2d582f4

[13] https://www.kantar.com/inspiration/brands/revealed-the-worlds-most-valuable-brands-of-2023

[14] https://en.wikipedia.org/wiki/Rickrolling

[15] https://www.linkedin.com/pulse/investigating-use-domain-name-entropy-clustering-results-barnett/

This article was first published on 22 August 2023 at:

https://www.iamstobbs.com/opinion/un-.zip-ping-and-un-.box-ing-the-risks-associated-with-new-tlds

Tuesday, 1 August 2023

X (trade)marks the spot: *not* a textbook example of a successful rebranding exercise

by David Barnett and Ernie Bell

It is fair to say that Elon Musk’s rebranding of Twitter as 'X' (announced on Monday 24 July 2023) - attempting to reimagine it as a 'super app' analogous to Tencent's WeChat[1] - has not gone smoothly. From a purely practical point of view, the removal of the physical lettering from the headquarters in San Francisco was interrupted by police when it emerged that appropriate permissions had not been sought, leaving the building just reading 'er'[2].

Conventional wisdom (rightly) dictates that a new brand name should ideally be novel and distinctive. This not only aids with the acquisition of relevant intellectual property protection, but also makes the subsequent process of monitoring for, and enforcing against, brand infringements much more straightforward. The new X brand is neither of these things, with even the logo itself appearing very similar to a standard Unicode[3] digital character[4,5] (Figure 1). The Executive Creative Director at Monotype (the font-set in question) has publicly commented that Musk's 'X' is not taken from their 'Special Alphabets 4'[6,7]; however, it would have been a risky strategy to not to do proper due diligence and clear this matter with Unicode prior to launch. Elon Musk had asked his followers to find an 'X' logo "good enough" to rebrand Twitter[8] and they obliged. This type of 'crowdsourcing' is definitely not the recommended way of creating the intellectual property which a brand owner intends to use as the public face of their organisation. Many questions arise as to ownership, use, goodwill, royalties, rights of enforcement and the many more legal challenges which can result from using someone else's intellectual property.

Figure 1: The new profile page for Twitter / X (top), and a description page for Unicode character U+1D54F (bottom)

Furthermore, in IP parlance, a single letter has 'no semantic content'. Nevertheless, this has not prevented a number of 'domain brokers' and other speculators from attempting to take advantage of the buzz and monetise a range of Internet assets - even those which are apparently totally irrelevant - containing the 'x' (Figure 2).

Figure 2: Example of a posting by a domain broker attempting to monetise (purportedly) X-related domain names

The situation is further complicated by the fact that a number of other companies - including Meta and Microsoft - already have intellectual property rights covering the same letter (relevant to the Xbox brand in the latter case), with over 900 active US trademarks already registered, making it unclear how Twitter / X might be able to defend this brand[9,10].

It is also important to bear in mind that the Twitter brand has a great deal of legacy familiarity and goodwill associated with it, which Musk risks losing - with associated damage to brand value - following the renaming[11].

Beyond this, the online landscape is - frankly - a mess. The launch of a new brand should generally be preceded by the acquisition of relevant online 'real estate' (potentially anonymously, to avoid the pre-emptive leakage of information relating to the brand's identity). The relevant content might include domain names - both 'core' domains to be used for the general infrastructure of the website and business, and 'strategic' domains comprising defensive registrations and names to be used for future extensions of the business - and other relevant assets, such as social-media profiles.

However, at the time of launch, Twitter was not even in possession of the @x username on its own platform, subsequently seizing it from its long-term owner without warning or compensation[12]. Of course, social media sites make it very clear that they have the right to revoke username handles at any time, but this example could set a concerning precedent for other sites to do the same, as and when they may choose to rebrand. We note that, three days after the supposed brand launch, Twitter / X had not rebranded its official presence on the Meta platforms Facebook and Instagram, which may cause some brand confusion. Additionally, the main corporate website was also showing a confusing mix of branding (Figure 3), and the mobile app was still branded as Twitter.

Figure 3: The desktop log-in page of Twitter / X as of 27 July 2023

Additionally, as of the day of launch, the x.com domain - despite having been acquired by the organisation - had not seen the relevant DNS changes successfully propagated across the Internet, meaning that many users were seeing just a GoDaddy parking page featuring sponsored ads for third-party websites and services. It has also been noted that a number of other relevant domain names - including examples such as xsafety.com, which could potentially be confused with official Twitter / X sites - are currently owned by third parties or listed for sale[13]. Beyond these, any unregistered relevant domain names are ripe for purchase by cybersquatters, who may attempt to sell them back to the official corporation, or by other bad actors for phishing, malware distribution, brand impersonation or other types of attack or infringement.

Indeed, zone-file analysis of the set of .com domains only (the most popular domain extension by a significant margin)[14], and considering only domains with 'x' at the start, gives some indication of the scale of the problem. As of 27 July 2023 (three days after Twitter's rebrand), the .com zone file contains around 1.04 million registered domain names beginning with 'x' (excluding internationalised domain names, which are encoded as (Punycode) strings beginning 'xn--'). Of these, around 6,500 were not present in the zone file on 21 July (three days before the rebranding) and have therefore been registered in the intervening six-day period. Of course, many of these are clearly unrelated to the Twitter / X brand, but many do feature keywords suggesting that they may have been registered with the rebranding in mind - to take advantage of the online buzz, misdirect users, or to cybersquat - or could be confused with domains falling into these categories.

At least 300 new domains were found to be of potential high relevance, including examples featuring keywords such as 'app', 'download', 'musk', 'coin', 'invest', 'help' or 'service'. Of these, around one in five were found to have active MX (mail exchange) records, indicating that they have been configured to be able to send and receive e-mails and - even in the absence of any live site content - could potentially be associated with phishing activity.

Furthermore, however, many of the high-relevance domains were found already to resolve to live websites, including several examples which appear potentially to be infringing against the X brand, and some which may be associated with active scams. Some examples are shown in Figure 4.

Figure 4: Examples of potentially infringing new domain registrations relating to the X brand (top to bottom: xcoinerc[.]com; x-coin-x[.]com; xdogeeth[.]com; xelonerc[.]comxbluetoken[.]com; xpay-project[.]comxmoontoken[.]com; xwifecoin[.]com)

The experiences surrounding the X rebrand illustrate a number of key points an organisation should bear in mind when considering the launch of a new brand name. Some of the most significant lessons include:

  • Select the brand name carefully, and ideally choose one which is distinctive and not under prior use by third parties. Marks which are more distinctive generally afford greater degrees of protection.
  • Ensure that IP rights are protected through the registration of appropriate trademarks in relevant classes and jurisdictions. Although rights generally arise through the use of a mark in the course of trade, applications can be submitted on the basis of intention to use.
  • Ensure that key online assets - including branded domain names and social-media profile handles - are acquired, available for use, and configured correctly.
  • Following the launch of the brand, continued proactive ongoing monitoring for infringements, and enforcement against concerning content, is crucial.

References

[1] https://www.cnbc.com/2023/07/26/elon-musks-x-rebrand-reignites-goal-to-turn-twitter-into-chinas-wechat.html

[2] https://www.theguardian.com/technology/2023/jul/25/elon-musk-x-rebrand-twitter-sign-removal

[3] https://home.unicode.org/

[4] https://www.hitc.com/en-gb/2023/07/25/twitters-mathematical-double-struck-capital-x-logo-mocked-for-unicode-resemblance/

[5] https://twitter.com/EliotHiggins/status/1683427725892042753

[6] https://www.ft.com/content/da262b2a-f39a-466b-9b2f-2f8fa84f0117

[7] https://www.businessinsider.com/elon-musk-made-new-x-twitter-logo-says-will-change-again-2023-7

[8] https://www.theverge.com/2023/7/26/23809087/elon-musk-x-logo-twitter-trademark

[9] https://www.reuters.com/technology/problem-with-x-meta-microsoft-hundreds-more-own-trademarks-new-twitter-name-2023-07-25/

[10] https://www.cbsnews.com/news/twitter-trademark-x-com-rebrand/

[11] https://www.linkedin.com/posts/vaibhavsisinty_elon-musk-just-killed-twitter-he-is-rebranding-activity-7089862486773956608--uJy

[12] https://techcrunch.com/2023/07/26/twitter-now-x-took-over-the-x-handle-without-warning-or-compensating-its-owner/

[13] https://www.forbes.com/sites/barrycollins/2023/07/24/the-x-rated-problem-with-twitters-new-name-for-millions-of-users/

[14] https://research.domaintools.com/statistics/tld-counts/

This article was first published on 1 August 2023 at:

https://www.iamstobbs.com/opinion/x-trademarks-the-spot-not-a-textbook-example-of-a-successful-rebranding-exercise

Wednesday, 26 July 2023

Trends in Web3 – Part 1: A look at blockchain domains

by David Barnett, Rebecca Newman, Tom Ambridge and Richard Ferguson

Introduction

The world of Web3 - the so-called 'third generation' of web content which has emerged over the last few years in response to a series of technological developments - continues to generate significant amounts of discussion and speculation. A number of the associated applications present the potential for new types of infringements, brand abuse and online risks, and warrant close attention by brand owners and general Internet users alike.

In this article we consider trends in the Web3 landscape - with a particular focus on blockchain domains - and how these may be driven by external factors, such as developments in artificial intelligence (AI). We also discuss the implications for brand owners and the associated brand protection considerations.

Overview of Web3 concepts

i. Technical definitions

Web3 (a.k.a. 'Web 3.0') is a general term referring to decentralised content on the Internet (i.e. organised on a peer-to-peer basis and without reliance on authoritative hosting providers), with a particular focus on blockchain technologies. The term 'Web3' reflects the emergence of new technologies which provide immersive experiences for users and encourage freedom of speech. It follows on from the definition of Web2 in the early-2000s, to indicate a transition from the 'read-only' days of the early Internet into an ecosystem more dominated by user-generated content.

A blockchain is a publicly accessible digital ledger in which transactions are recorded. It is cryptographically sealed and cannot be modified after its contents are recorded. Blockchains form the basis of many digital currencies (such as Bitcoin), but also have a number of other applications, such as supply-chain control by brand owners. In terms of brand protection considerations, two related concepts are of particular relevance:

  • NFTs (non-fungible tokens)[1] - NFTs are cryptographic collectibles comprising any of several types of asset or media. Any digital file can be converted into an NFT through a process known as minting, whereby ownership is recorded on a blockchain. NFTs can take a number of different forms, but are most commonly associated with graphics files (e.g. artworks, branded imagery, etc.) and other types of digital content (such as audio or music files). Brand owners are increasingly incorporating NFTs into their business models, such as the production and trade of virtual branded items (e.g. items to be worn by avatars in virtual-reality environments - part of the 'metaverse', the name given to a generalised connected environment of 3D virtual worlds.
  • Blockchain domains - Like regular domains, blockchain domains consist of a second-level domain name and an extension (with specific examples including .eth, .crypto, and .bit), and can be used in a number of different ways, including the construction of decentralised websites (which have special access requirements, such as the use of a dedicated browser like Brave, or a browser plug-in), as memorable wallet addresses for sending and receiving cryptocurrency, or as hosting infrastructure for programs to be run as apps. Blockchain domains are recorded, together with their ownership details, on a blockchain (i.e. are not hosted on a server, or recorded in a regular registry zone file) and, unlike regular domain names, are not governed or regulated by ICANN (the Internet Corporation for Assigned Names and Numbers). They are offered by specialist providers and, although the costs may be higher than for traditional domains, are in many cases offered for registration for a longer period than gTLDs, or involve only a one-off cost to own for ever.

Blockchain domains are attractive to many users because of the inherent security associated with the blockchain infrastructure, and their resistance against traditional blocking or censorship methods. For those with interests in other areas of the Web3 ecosystem, use of blockchain domains may be a natural choice.

Future development of native support of blockchain domains by mainstream web browsers is also likely to significantly drive increased adoption by users. Already, blockchain domain operators are seeking technical workarounds to drive the interoperability of blockchain domains with regular browsers. One example is eth.link, a service allowing .eth blockchain domains to be accessed via DNS, by appending '.link' to the blockchain domain name[2].

ii. Brand protection implications

From a brand monitoring point of view, blockchain domains are generally difficult to identify, both because of the absence of zone files (which, for regular domains, provide comprehensive lists of registered domains across the individual domain extensions, or TLDs (top-level domains)), and because of the specific website access requirements. One commonly-used technique to circumvent this difficulty can be to search for references to the blockchain domain names being traded in NFT marketplaces (for example, where the current owner can offer the sale of a domain to another interested party, which may be a brand owner or a would-be infringer), and some blockchain domain providers also provide searchable databases of registered domains. However, more robust methods are likely to require direct monitoring of the content of the blockchains themselves, or searches across databases of transactions which have occurred on a specific blockchain.

Enforcement options are also currently limited, with one option being just to take down infringing listings offering the sale of a blockchain domain from the Web3 marketplace - although this does not deactivate the domain name itself or change its ownership. Some blockchain domain providers are becoming more mindful of the risks posed by cybersquatters[3], and offer brand owners the ability to block third-party registrations (similar to the Trademark Clearinghouse (TMCH) programme for new gTLDs) or to claim ownership of trademarked names. However, these blocks are at the discretion of the domain providers, making them subject to change and in need of periodic monitoring.

Additionally, wallet addresses associated with specific blockchain domain or NFT owners - as might be available through public records, Web3 marketplaces, or blockchain domain providers - can be used as the basis of an investigation to identify additional associated information relating to the entity in question. In some cases, it may also be possible to submit a court order to the service provider for the disclosure of collected data.

As a further brand protection initiative, brand owners may also wish to consider proactively defensively registering key domain name keyword strings across relevant extensions. This approach is generally more cost-effective than attempting to subsequently acquire domain names of interest.

The changing landscape

A number of recent factors - primarily driven by developments in AI technologies - may very well impact on the role of Web3 within the wider Internet landscape, even if (as we shall see) we have not yet seen any major new growth in the level of uptake of the relevant technologies.

The developers of AI products and services such as ChatGPT are increasingly using the data held by platforms such as Reddit and Twitter as an input for their training models, resulting in the introduction of initiatives by these platforms to prevent or monetise this activity. In April 2023, Reddit announced plans to start charging for use of its data API[4], resulting in a number of subreddits (communities) - or associated applications - being made private, or shutting down altogether[5], ('going dark') in protest[6,7]. In July, Twitter introduced a measure to limit the number of posts a user could read per day[8], to prevent "extreme levels of data scraping and system manipulation"[9]. The actions taken by online platforms to remove and restrict access to content are widely seen as being contrary to a fundamental characteristic of Web2, namely the ability of individual users to create and curate their own content, and choose which content to consume from other users.

It is interesting to note that Web3 and the metaverse has recently been declared 'dead' by some commentators[10], following moves by Meta and Mark Zuckerberg - together with other industry leaders - away from these areas, in favour of development of generative AI[11]. However, it is perhaps that very trend - and the associated reactions by service providers - which may point us back to the need for a Web3, being inherently decentralised and less prone to regulation and restriction. Could these technologies see a new lease of life as a safe harbour for the community-owned content which used to be the province of Web2[12]? The answer is a resounding 'yes', according to AI commentator Alex Valaitis, who recently tweeted that "AI becomes stronger the more centralized it gets, which is exactly why we need Web3 as a counterweight[;] think of public blockchains as the last bastion of the open internet"[13].

The existence of the Web3 Domain Alliance[14] is also noteworthy. It features many of the major Web3 service providers as members, and is intended to drive "consumer protection, preventing naming collisions, fair and open use of intellectual property in the industry, and interoperability of blockchain naming systems"[15]. As part of this initiative, Unstoppable Domains announced in February 2023 that it would not enforce a key patent - relating to the use of smart contracts[16] in the blockchain domain naming process - against other members of the group[17].

The Web3 ecosystem also presents its own problems, however, such as the ongoing US litigation between Unstoppable Domains and Wallet Inc. over rights to offer domains across the .wallet extension[18].

Observed trends in blockchain domains and Web3 content

There is evidence of a significant amount of (at the very least, legacy) interest in Web3 concepts; searches across the (gTLD) zone files provided by ICANN show that, as of July 2023, there are over 300,000 registered (regular) domain names containing the Web3-related keywords 'web3', 'nft', 'blockchain' or 'metaverse'.

Currently, there are around 7 million blockchain domains registered, with two of the most popular providers - Ethereum Name Service ('ENS') (which offers .eth domains on the blockchain associated with the Ethereum cryptocurrency) and Unstoppable Domains (offering blockchain domains across a range of more than ten extensions) - having provided 2.7 million[19] and 3.6 million[20] registrations, respectively.

In the remainder of this article, we consider activity surrounding .eth registrations as a proxy for the overall blockchain domain landscape, both because of the popularity of the Ethereum Name Service and because of the ready availability of associated statistics available through information and tools provided by Dune Analytics.

Dune states that, as of 17 July 2023, there are 2,719,569 active ENS (.eth) blockchain domains[21]. The registration history of these domains is available back to May 2019, and is shown in Figure 1.

Figure 1: Monthly numbers of .eth blockchain domain registrations

The statistics show a very large peak in activity covering roughly the calendar year of 2022, after which levels of registrations appear to have dropped off for now. This is consistent with the flurry of activity where available three- and four-digit domains were being rapidly registered, with the monthly trading volume peaking at $44.3 million in May 2022[22].

It is also possible to extract more granular data, looking at the individual blockchain domain registrations (names and registration dates), for the last year (July 2022 – July 2023)[23] (Figure 2).

Figure 2: Daily numbers of .eth blockchain domain registrations (July 2022 – July 2023)

Within this dataset of 1.47 million blockchain domains, we consider the prevalence of domains with names containing each of the top ten most valuable global brands in 2023 (according to data provided in the latest Kantar BrandZ study[24]). This dataset (see Table 1 and Figure 3) provides a measure of the likely level of potential brand infringement across the blockchain domain landscape.

Brand string
                                       
No. registered .eth domains
(July 2022 - July 2023)
                                                   
  apple 902
  google 625
  microsoft 249
  amazon 926
  mcdonalds 136
  visa 301
  tencent 43
  vuitton 119
  mastercard 66
  coca(-)cola * 160

* The hyphen in the brand string is optional, so the data considers examples containing 'cocacola' or 'coca-cola'.

Table 1: Total numbers of .eth blockchain domains registered between July 2022 and July 2023 with names containing each of the top ten most valuable global brands in 2023

Figure 3: Monthly numbers of .eth blockchain domain registrations with names containing each of the top ten most valuable global brands in 2023

It is also worth noting that the inclusion of Unicode support in the blockchain domain infrastructure allows special characters such as emojis to be included in the domain names[25]. Consequently, the dataset includes examples such as those shown in Figure 4, many of which have the potential to be used to create highly deceptive and/or purportedly official websites.

Figure 4: Examples of branded .eth blockchain domain names with names including special characters

A similar previous study published on the DNS Research Federation blog has highlighted the potential for such domains to be used fraudulently or for other infringing purposes, and identified multiple instances of prolific serial registrants, each in ownership of over 100 branded domain names[26]. The difficulties with monitoring and enforcement across the blockchain landscape also makes these domains attractive to cybersquatters[27].

Currently, very few of the branded blockchain domains in the dataset resolves to any significant content, although a small number of live sites or active website responses were identified (Figure 5). These include one webpage (Figure 5(iii)) analogous to a server index page, which is sometimes seen with sites under development as a precursor to subsequent, more significant website content, or when 'hidden' (potentially harmful) content is present in one of the subdirectories.

(i)

(ii)

(iii)

(iv)

Figure 5: Examples of live websites or other active webpage responses associated with .eth blockchain domains with names containing any of the top ten most valuable global brands in 2023 - (i) googleisadog.eth; (ii) whalevisa.eth (potentially unrelated to the Visa brand); (iii) tencentglobal.eth; (iv) googlenoodle.eth

Conclusions

These observations raise a number of questions about the likely direction of Web3 trends going forward. Arguably, there is a case to be made that the peak in activity in blockchain domain registrations took place in 2022, and has now greatly subsided. However, it may simply be that this time period simply represented a 'golden age' for registrations, when significant numbers of the highly-desirable available domain names were snapped up by prospectors; the creation of a pre-existing landscape to which future activity will be added. It is also possible that the nascent nature of Web3, and uncertainty by brand owners over which department should take responsibility for blockchain domains, has resulted in reluctance by corporations to embrace and adopt the technologies.

Furthermore, there must also be questions surrounding the use of an analysis of domains with just a single extension, on a single blockchain, as a proxy for the whole Web3 ecosystem.

Overall, however, it seems reasonable to assert that AI technologies - and other technological and societal developments - may well drive a resurgence of interest in Web 3. Given the scale of legacy activity in the associated areas, and the numbers and nature of associated infringements, it seems advisable for brand owners to be mindful of blockchain domains targeting their brands, and to carefully consider their own brand-protection strategies in the Web3 arena.

References

[1] https://www.linkedin.com/pulse/rise-nft-david-barnett

[2] https://eth.link/

[3] https://www.brandsec.com.au/blockchain-domains-and-cybersquatting/

[4] https://techcrunch.com/2023/04/18/reddit-will-begin-charging-for-access-to-its-api/

[5] https://www.reddit.com/r/apolloapp/comments/144f6xm/apollo_will_close_down_on_june_30th_reddits/

[6] https://news.sky.com/story/reddit-blackout-thousands-of-communities-are-doing-dark-today-heres-why-12899280

[7] https://www.theverge.com/2023/6/30/23779519/reddit-third-party-app-shut-down-apollo-sync-baconreader-api-protest

[8] https://www.bbc.co.uk/news/technology-66093324

[9] https://twitter.com/elonmusk/status/1675187969420828672

[10] https://www.splunk.com/en_us/blog/learn/blockchain-web3-dead.html

[11] https://www.businessinsider.com/metaverse-dead-obituary-facebook-mark-zuckerberg-tech-fad-ai-chatgpt-2023-5

[12] https://cointelegraph.com/news/how-adoption-of-a-decentralized-internet-can-improve-digital-ownership

[13] https://twitter.com/alex_valaitis/status/1674840503861248018

[14] https://www.web3domainalliance.com/

[15] https://cointelegraph.com/news/web3-domain-alliance-expands-with-51-new-members

[16] https://www.investopedia.com/terms/s/smart-contracts.asp

[17] https://fortune.com/crypto/2023/02/22/unstoppable-pledges-patent-non-aggression-pact-across-expanded-web3-domain-alliance/

[18] Unstoppable Domains, Inc. v. Wallet Inc. et al. 1:2022cv01231

[19] https://ens.domains/

[20] https://unstoppabledomains.com/

[21] https://dune.com/makoto/ens

[22] https://dappradar.com/blog/best-blockchain-web3-domain-names-services

[23] https://dune.com/makoto/ens-released-to-be-released-names

[24] https://www.kantar.com/inspiration/brands/revealed-the-worlds-most-valuable-brands-of-2023

[25] https://nptacek.medium.com/experimenting-with-ens-c88bfe7ed246

[26] https://dnsrf.org/blog/brand-names-in-blockchain-domains---new-frontier-for-brand-owners/index.html

[27] https://www.thefashionlaw.com/the-rise-in-blockchain-domains-presents-risks-opportunities-for-brands/

This article was first published on 26 July 2023 at:

https://www.iamstobbs.com/opinion/trends-in-web3-part-1-a-look-at-blockchain-domains

Monday, 3 July 2023

An overview of the concept and use of domain-name entropy

Introduction

In this article, I present an overview of a series of 'proof-of-concept' studies looking at the application of domain-name entropy as a means of clustering together related domain registrations, and serving as an input into potential metrics to determine the likely level of threat which may be posed by a domain.

In our previous studies, we utilised the mathematical concept of Shannon entropy[1], providing a measure of the amount of information stored in a string of characters (or, equivalently, the number of bits required to optimally encode the string). The idea was applied to the second-level domain name (SLD) part of each domain (i.e. the portion of the domain name before the dot - such as 'google' in 'google.com'), and broadly means that short domain names, or those with large numbers of repeated characters, will have low entropy values, whereas longer domain names, or those with large numbers of distinct characters, will have higher entropy.

The background to this analysis is the fact that domains registered for egregious purposes (such as spamming, malware distribution, or botnet creation) may be more likely to be registered in bulk by bad actors using automated algorithms[2], which typically results in the generation of long, non-sensical (i.e. high entropy) domain names, which have the added benefit of not containing brand-related keywords and are typically therefore harder to detect using classic brand-monitoring techniques. The idea is that domains registered by a particular infringer for a specific campaign are likely all to be generated using the same algorithm, and may therefore have similar or identical entropy values.

Overview of previous studies

In our initial proof of concept[3], we considered the set of all domains registered on a particular day - a sample of around 205,000 domains. The advantage also of considering a set of domains with a common registration date is that it presents the possibility for one or more groups of automated bulk registrations (which are typically all registered at the same time) to be present.

Within the dataset, a range of domain entropy values was present, from a minimum of 0.000, to a maximum of 4.700, and with 92.3% of the dataset having values below 3.500. (see Figure 1). The top 1,000 highest-entropy domains (i.e. the top 0.49%) had entropy values in excess of 3.823, and accounted for the majority of examples which appeared visually to feature 'random' SLD strings. Within this high-entropy subset, a number of additional characteristics were indicative that many may have been registered for nefarious purposes, including the prominence of use of consumer-grade registrars and privacy-protection services, and the extent of the presence of active MX records amongst these new registrations (in 27.5% of the cases - indicating that these domains have been configured to be able to send and receive e-mails and therefore could potentially be associated with phishing activity).

Figure 1: Cumulative proportion of domains with entropy less than the value shown on the horizontal axis, from the dataset in the initial proof-of-concept study

Indeed, at least one apparent 'cluster' of suspicious registrations was found to be present within the dataset, comprising a group of 125 .buzz ('dot-buzz') domains, all with an identical high entropy value (3.907), registered via a common registrar and associated with groups of similar IP addresses. At the time of analysis, many of the domains registered to Chinese-language, gambling-related websites, likely representing either an affiliate revenue generation scheme, or 'dummy' content serving to 'mask' higher-threat content which may only have been visible in specific geographic regions, or which may have been planned for subsequent upload.

In a follow-up study[4], I considered a month's worth of registrations of domains with names containing any of the top ten most valuable brands in 2022. Similarly, the high entropy domain names within this dataset included groups of apparently related, coordinated 'clusters' of domains, several of which appeared intended for fraudulent use and were consistent with registration via automated generation algorithms. For example, seven of the top eight domains in the dataset (by entropy values) had similar names of the form 'google-site-verificationXXXXXX.com' (or .net) (where 'XXXXXX' was a long string of apparently random characters), and a series of groups of 'microsoft' examples was identified, including keywords such as 'cloudworkflow', 'netsuites' and 'cloudroam'.

Comparison with other work

Other studies taking similar approaches to the analysis of domain entropy also reach similar conclusions. For example, an analysis outlined in a blog posting by Tiberium[5] states that the use of an entropy threshold of >3.1 (as an indicator of potential concern) correctly classifies 80% of NCSC malicious domains, and incorrectly classifies only 8% of the top 1000 most popular (legitimate!) domains overall (cf. Table 1).

Domain name
                                       
Entropy value
                           
  google.com 1.918
  youtube.com 2.522
  facebook.com 2.750
  twitter.com 2.128
  instagram.com 2.948
  baidu.com 2.322
  wikipedia.org 2.642
  yandex.ru 2.585
  yahoo.com 1.922
  whatsapp.com 2.500

Table 1: Entropy values of the SLDs of the top ten most popular websites according to Similarweb[6]

Additionally, an article published by Splunk[7] looking at the entropy values of fully qualified domain names, i.e. also including subdomain names - also states that high-entropy examples are consistent with the use of domain generation algorithms, and may be indicative of association with malware (e.g. in 'beaconing') and other web exploits. Comparable approaches and conclusions can also be found in a range of other studies[8,9,10], with some finding improvements in the reliability of threat determination through the use of alternative measures such as relative entropy (essentially, a comparison against the character distribution observed in a dataset of known legitimate domains, so as to provide a better measure of the randomness arising from automated algorithmic registrations)[11].

Conclusions

Domain-name entropy analysis has applications in at least two key areas of brand protection. The first of these is the ability to 'cluster' together related infringements, which has a number of benefits, including the ability to identify serial infringers and instances of bad-faith activity, for targeted and effective bulk enforcement actions. The second key area is as an input into algorithms to quantify the likely level of threat which may be posed by an online feature such as a new domain registration. Threat determination is essential in allowing prioritisation of results for analysis, enforcement, or content-change tracking.

All other factors being equal, there is some indication that high-threat domains - particularly those associated with automated registrations by domain-name generation algorithms - may have a tendency to sit at the higher-entropy end of the spectrum (and, furthermore, that domain names generated using a particular algorithm may be likely to have similar entropy values). This statement runs alongside the assertion that legitimate domains may (in general) be more likely to have lower entropy values, particularly where there is a desire for legitimate businesses to utilise strongly branded, short, memorable web addresses - as can be seen in many of the globally most popular websites.

References

[1] https://arxiv.org/ftp/arxiv/papers/1405/1405.2061.pdf

[2] https://interisle.net/sub/CriminalDomainAbuse.pdf

[3] https://www.linkedin.com/pulse/investigating-use-domain-name-entropy-clustering-results-barnett/

[4] https://www.linkedin.com/pulse/entropy-analysis-registered-domain-names-relating-top-david-barnett/

[5] https://www.tiberium.io/blog/chapter-2-classifying-domains-through-string-entropy/

[6] https://www.similarweb.com/top-websites/

[7] https://www.splunk.com/en_us/blog/security/random-words-on-entropy-and-dns.html

[8] https://hurricanelabs.com/blog/dns-entropy-hunting-and-you/

[9] https://www.logpoint.com/en/blog/embracing-randomness-to-detect-threats-through-entropy/

[10] https://suleman-qutb.medium.com/use-of-shannon-entropy-estimation-for-dga-detection-9ded275795ca

[11] https://redcanary.com/blog/threat-hunting-entropy/

This article was first published on 3 July 2023 at:

https://circleid.com/posts/20230703-an-overview-of-the-concept-and-use-of-domain-name-entropy

Experimenting with a new domain data source to identify hard-to-find web content

Introduction The monitoring component of brand protection services aims to identify infringing web content relating to a particular brand, w...