by David Barnett and Tom Ambridge
BLOG POST
Web3 - the general name given to the newest generation of decentralised Internet technologies (following on from the earlier phases of a landscape dominated by read-only, and then by user-generated, content) - continues to see new developments, in part pushed by growth in AI technologies driving a greater demand for privacy and non-restriction on content[1].
One key area is the realm of blockchain domains, Web3 domain names operating on the same underlying technology as cryptocurrencies such as Bitcoin. These domain names can be utilised for a number of purposes, including the construction of decentralised (peer-to-peer) websites, and wallet addresses for sending and receiving cryptocurrency.
In this study, we update earlier work looking at a set of 1.47 million (.eth) domains on the Ethereum blockchain (comprising a year's worth of registrations, and around one half of the total on this particular blockchain) to identify additional trends and patterns in the dataset, and indicators of brand infringements.
Amongst the main findings are the facts that:
- Well-known trusted brands are heavily subject to infringement, in the form of blockchain domain names containing the name of the brand in question, or 'fuzzy matches', where one or more character is replaced by an alternative character which may appear visibly extremely similar, thereby presenting the potential for a highly deceptive lookalike name.
- A new trend in blockchain domain name use appears to be emerging, in the form of instances where the domain name itself (which can incorporate a wide range of special characters) comprises an artwork - similar to an NFT - when the characters in the domain name are displayed in a suitable grid.
In response to these observations, brand owners are advised that:
- It is crucial to be aware of developments in the new arenas of Web3, including blockchain domains and NFTs. Large numbers of short blockchain domain names are already taken, in addition to a wide range of others which may be infringing brand names. This highlights the requirement for brand owners to be early adopters for securing blockchain domain names of which they may wish to make use, and for monitoring activity on domains which are already taken.
- The above risks may be further augmented by the emergence of a new trend in 'collectible' domain names, perhaps representing an evolution in the pre-existing ecosystem of domain 'clubs'. A related risk may turn out to be the emergence of blockchain domain names incorporating branded IP (imagery or logos) within the domain name itself. The identification of this type of infringement may require the development of new (potentially AI-based) image-recognition technologies.
- Within the blockchain domain landscape more generally, new trends and risks might include:
- The possibility of naming collisions (arising from the same domain extension being used across multiple blockchains, or in both Web3 and Web2 (i.e. 'classic' domain name) contexts[2].
- Disputes over control of new extensions - both this and point (1.) are likely to be exacerbated by the launches of new providers, and of new extensions as part of the new-gTLD programme[3]. Brand owners wishing to apply for dot-brand extensions may need to be particularly mindful of these issues.
- Name wrapping, or the emergence of providers offering tradeable sub-domains of blockchain domain names.
- Decentralised Autonomous Organisations, entities managed via blockchain-based programs for tracking of business functions.
These last two points are discussed in more detail in reference [2].
References
[1] https://www.iamstobbs.com/opinion/trends-in-web3-part-1-a-look-at-blockchain-domains
[3] https://www.iamstobbs.com/opinion/the-new-new-gtlds
This article was first published on 17 November 2023 at:
https://www.iamstobbs.com/opinion/trends-in-web3-part-2-a-look-at-blockchain-domains
* * * * *
WHITE PAPER
Executive Summary
Blockchain domains are a key element of the Web3 ecosystem of technologies, enabling users to create decentralised websites and memorable cryptocurrency wallet addresses. Developments in artificial intelligence (AI) - particularly the scraping of data from social media platforms and the resulting restrictions in public access to associated content - may well drive a push towards increased adoption of Web3, with its emphasis on personalisation and lack of regulation and restriction.
In this study, we take a deep dive into a dataset of 1.47 million (.eth) domains on the Ethereum blockchain, as a proxy for the overall blockchain domain landscape, to identify trends and patterns, and consider indicators of potential brand infringement.
The main findings of the study are as follows:
- Numbers of blockchain domain registration have dropped off following a peak in 2022, though continuing levels of activity are observed. The median length of the domain names (as measured by the SLD, or second-level domain name - the part of the name to the left of the dot) showed a dip between around August and October 2022, giving an indication that this period was associated with a 'golden age' of registration of shorter, desirable domain names.
- 99.8% of the blockchain domain names within the dataset have SLD lengths of 32 characters or fewer, but there is a 'long tail' of longer domain names, with the longest domains having lengths of over 38,000 characters.
- Many of these longer domain names (which can include a range of special characters such as emojis) appear to comprise artworks in their own right, with the domain names themselves appearing as images when the component characters are displayed as 'pixels' in a suitable grid. This observation is indicative of a new use-case for Web3 domain names, suggesting that they are being treated as collectible items similar to NFTs.
- Overall, nearly 3,000 distinct characters were found to have been utilised in the blockchain domain names across the whole dataset.
- Amongst the other long domain names, several consist just of large numbers of repeated and/or non-sensical characters (as shown by their low Shannon entropy values in many cases). The exact purpose of these names is not clear, but it is possible that these types of blockchain domains are associated with domain 'clubs', associated with groups of collectible domain names with particular characteristics.
- Many hundreds of blockchain domain names featuring exact or 'fuzzy' matches (replaced, missing, additional or transposed characters) to the names of the top ten most valuable global brands were identified. These present serious potential for use in producing deceptive websites, or constituting instances of cyber- or typosquatting, and with some of the most potentially confusing examples including ạpple.eth, googߊe.eth, Microsoft.eth, amɑzon.eth, and mcdönalds.eth (with non-Latin characters highlighted in bold).
Introduction
In the previous article in this series[1], we considered the ecosystem of blockchain domains as an example of an area of the newly-emerging Web3 landscape of decentralised technologies, and saw how - as in comparable areas of the traditional Internet - they can be associated with instances of brand infringements and be utilised for other types of cyberattack. However, unlike regular domains, the relative difficulties with monitoring and enforcement across the blockchain infrastructure means that blockchain domains may transpire to be a popular choice with bad actors.
In the initial study, we considered a dataset of all .eth blockchain domains registered in the previous year (1.47 million domains, or 54% of the total set of active ENS domains), as a proxy for the overall blockchain domain landscape. In this follow-up, we provide an overview of a more detailed analysis of this same dataset, to identify trends and patterns in the registered domains, with a focus on potential indicators of brand infringement.
The Deep Dive
Deep Dive Part 1 - Trends over time in domain-name length
Our previous analysis showed that the numbers of new blockchain registrations have dropped off since a peak in 2022, potentially a reflection of this earlier period representing the timeframe within which the shorter-length available domain names were being registered by users and prospectors. In order to test this theory, we consider the mean length (in terms of the number of characters in the second-level domain name (SLD) - i.e. the part of the domain name before the dot) of the blockchain domain names registered each day, across the same period (Figure 1).
Figure 1: Mean SLD length of the set of daily .eth blockchain domain registrations (July 2022 - July 2023)
This analysis shows no meaningful trend, with the mean domain-name length within the daily sets of registrations not changing significantly across the one-year period. However, the dataset does include some very long domain names (see Deep Dive Part 2), which may skew the daily mean values. Accordingly, it may be more instructive to consider the median length of the domains registered each day (Figure 2).
Figure 2: Median SLD length of the set of daily .eth blockchain domain registrations (July 2022 - July 2023)
Whilst there are no significant overall variations between the start and end of the period, this analysis does more clearly show that the blockchain domains being registered between August and October 2022 had names which broadly were shorter than across the rest of the year.
Deep Dive Part 2 - Distribution of domain-name lengths within the full dataset
Figure 3 shows that the vast majority of blockchain domains registered across the one-year period have names which are relatively short, with 99.8% of the dataset having SLD lengths of 32 characters or fewer.
Figure 3: Distribution of SLD length in the full dataset of .eth blockchain domain registrations (July 2022 - July 2023)
However, there is a very long tail of longer domain names, with the longest domains in the dataset (three instances) having a SLD length of 38,894 characters, as shown in Figure 4.
Figure 4: Cumulative proportion of domains with SLD length less than or equal to the value shown on the horizontal axis, for the full dataset
Deep Dive Part 3 - Long domain names and domain-name entropy
As noted above, there are several domains in the dataset with extremely long names, with 14 instances of SLDs with 8,000 characters or more. The first twenty characters of each of the longest names in the dataset are shown in Table 1.
Table 1: First twenty characters in each of the top-fourteen longest blockchain domain names in the dataset
A number of the domain names consist largely or wholly of emojis or other special characters, and the dataset also includes examples in which the domain name itself appears to be an artwork in its own right. One example is shown in Figure 5, where the domain name - consisting of 1,024 coloured-square characters - takes on a special appearance when displayed in a grid of 32 x 32 characters. Further similar examples are shown in Figures 6 and 7.
Figure 5: A 1,024-character domain name from within the dataset, displayed in a grid of 32 x 32 characters
Figure 6: Other examples of domain names comprising artworks (with SLD lengths (in characters) of 357, 381, 410, 415, 551, 597, 621 and 4,759)
Figure 7: A group of five domain names with SLD lengths of 577 characters, apparently comprising a set of numbered artworks
The emergence of these 'artwork' domain names may be indicative of a new trend in the Web3 domain market more generally, moving away from the Web2-style use of domains (and even the core purposes of Web3 domains; namely the facilitation of asset transfers without the requirement for a wallet address, hosting of decentralised websites, and mail exchange and communication) into a more collectible nature as was seen previously with NFTs. These distinctions may be reflective of a lack of centralised governance for the blockchain domain landscape.
It is also noteworthy that the domain names shown in Figure 7 look very similar to (and may be infringing) the CryptoPunks NFT collection owned by Yuga Labs, who recently won a US lawsuit against artist Ryder Ripps in relation to the sale of NFTs infringing their Bored Ape Yacht Club mark[2]. This raises a more general question around a requirement by brand owners for technologies able to track and interpret graphical domain names - for example, where branded imagery or logos are represented.
It is interesting to note that, outside the set of domain names which constitute some kind of graphical design, many of the longest domain names consist of very long strings of repeated characters. This can be shown by comparing the domain-name length against the entropy ('Shannon entropy') of the SLD string (a measure of the amount of randomness, or information, contained within it)[3], as shown in Figure 8 and Table 2. The data shows that many of the longest names have very low entropy, or zero - as is the case when the SLD consists only of a single character repeated.
Figure 8: Scatter plot of SLD length against SLD Shannon entropy, for the full dataset
SLD length (characters) |
Shannon entropy |
---|---|
38,894 | 0.000 |
38,894 | 0.000 |
38,894 | 0.000 |
37,058 | 4.225 |
33,792 | 2.000 |
32,745 | 0.000 |
32,713 | 0.000 |
29,425 | 0.000 |
11,111 | 0.000 |
10,245 | 0.000 |
10,000 | 0.000 |
10,000 | 0.000 |
8,613 | 6.216 |
8,000 | 0.000 |
Table 2: SLD length and Shannon entropy values for the top-fourteen longest blockchain domain names in the dataset (as referenced in Table 1)
Overall, the spread of entropy values within the dataset is shown in Figure 9. The highest-entropy domain name has a value of 9.081, for a name consisting of 929 special characters (with only a small number of repeated characters) (Figure 10).
Figure 9: Distribution of Shannon entropy values, for the full dataset
Figure 10: The highest-entropy domain name in the dataset
Deep Dive Part 4 - Distribution of characters within the full dataset
Within the full dataset, 2,979 distinct characters were found to have been used in the domain names. The number of instances of each of the top 50 most frequently occurring characters are shown in Figure 11, and the full set of the top-1000 characters (in descending order of frequency) is shown in Figure 12.
Figure 11: Number of instances of each of the top 50 most frequently occurring characters in the dataset
Figure 12: Top-1000 most frequently occurring characters in the dataset (in descending order of frequency)
Deep Dive Part 5 - Domain names containing fuzzy matches to the top ten most valuable global brand names
In the first article in this series, we considered the numbers of the domains in the dataset containing the names of each of the top ten most valuable global brands in 2023[4]. In this follow-up, we extend this analysis of exact matches to also consider 'fuzzy matches' of the following types:
- 'replaced character' - where any one character in the brand name is replaced by any other character
- 'missing character' - where any one character in the brand name is excluded
- 'additional character' - where any one additional character is inserted within the brand name
- 'transposed characters' - where any pair of adjacent characters in the brand name are swapped
Numerous previous studies have noted that fuzzy-match domains are commonly used by infringers to create deceptive websites, with names which can easily be confused with the official website for the brand in question, or in other types of infringement or traffic misdirection[5]. In other cases, domain names which are a close match to those of an official brand owner, and which are actively being used by a third party, can constitute cases of potential brand confusion.
Table 3 shows the total number of domains containing each of the above types of fuzzy match, for each of the top ten brands, within the full dataset of .eth blockchain domains.
* As in Part 1 of the article series
Table 3: Total numbers of .eth blockchain domains registered between July 2022 and July 2023 with names containing exact or fuzzy matches to the names of each of the top ten most valuable global brands in 2023
In some cases - particularly where the brand name is particularly short or generic - some of the fuzzy-match types are not likely to be relevant to the brand in question, potentially instead relating to other terms or third-party brand names (for example, 'visa' will fuzzy-match for 'via', 'invisible', 'lisa', etc. and 'apple' will fuzzy-match for 'rapper', 'maple', 'ripple', 'apply', etc.). However, the dataset does include a significant number of examples of domain names which provide very significant potential for confusion with the name of an official site for the brand in question, including several cases where character has been replaced by a non-Latin character which appears visually very similar (a ‘homoglyph’) (Figure 13) - with some of the most concerning examples including ạpple.eth, googߊe.eth, Microsoft.eth, amɑzon.eth, and mcdönalds.eth (where non-Latin characters are highlighted in bold). Whilst none of these domains currently resolves to any live site content (apart from one example which re-directs to an error page on the official Coca Cola website), they do - wherever under the control of third parties - present the risk for the subsequent addition of deceptive content, or may be speculative registrations intended to be offered for subsequent sale to the brand owner.
Figure 13: Examples from the dataset of potentially deceptive domain names featuring exact or fuzzy matches to any of the top ten most valuable global brand names
Discussion
There are several key observations to be drawn from this deep-dive analysis:
- Whilst the overall numbers of registrations of .eth blockchain domains have dropped off since the earlier peak in 2022, there is some indication that the earlier flurry of activity was associated with the registration of shorter domain names, with a definite dip in median second-level domain-name (SLD) length amongst the daily sets of registrations prior to around October 2022. Subsequently, we see an ongoing continuation of registrations (albeit at lower absolute levels) with a median SLD length consistently of around 9 characters.
- The vast majority (99.8%) of .eth domains registered in the last year have SLD lengths of 32 characters or fewer. However, there is a long tail of longer domain names, with 14 instances in the dataset of domains greater than 8,000 characters in length, up to a maximum value of 38,894 characters. Many of these long domain names are non-sensical or consist solely of repeated characters. However, there is a subset of long domain names in which the domain name itself consists of special graphical characters which, when arranged in an appropriate grid, display digital art similar to examples sometimes sold as NFTs.
- Outside of the set of 'artwork' blockchain domain names, the exact purpose of many of the very long domain names is not clear. It may simply be that these are part of a project by some users to build a portfolio of 'unusual' domain names. Alternatively they may be associated with domain 'clubs', consisting of groups of domain names featuring particular characteristics, which are collectible by virtue of their limited supply (e.g. '999 Club' - SLDs consisting of three digits; '10k Club' - domains with four digits; 'Single Ethmoji Club', etc.)[6]. In these cases, the registrations may have been carried out by bots which are configured to pre-emptively register relevant domain names. The presence in the dataset of the 'artwork' domains is further suggestive that many of the domains have been registered for their collectability and attractiveness to traders.
- Many hundreds of names containing exact or fuzzy matches to the names of any of the top ten most valuable global brands appear in the dataset, significant numbers of which present very considerable potential for confusion with official brand sites. Whilst generally not currently found to resolve to any live site content, they present serious risk of being associated with deceptive content in the future, or may be under the ownership of cybersquatters, further highlighting the importance of brand owners maintaining an awareness of activity in this continually evolving space.
References
[1] https://www.iamstobbs.com/opinion/trends-in-web3-part-1-a-look-at-blockchain-domains
[2] https://news.artnet.com/art-world/yuga-labs-lawsuit-tradmark-bayc-ryder-ripps-2290175
[3] https://www.linkedin.com/pulse/investigating-use-domain-name-entropy-clustering-results-barnett/
[4] https://www.kantar.com/inspiration/brands/revealed-the-worlds-most-valuable-brands-of-2023
[5] https://www.cscdbs.com/en/resources-news/threatening-domains-targeting-top-brands/
This paper was first published as an e-book on 17 November 2023 at:
No comments:
Post a Comment