Tuesday, 28 November 2023

Finding the right size: Measuring the prominence of fashion brands online

BLOG POST

The online prominence of brands can be a key metric for brand owners, and can serve as a data input for a number of areas, including search-engine optimisation (SEO) and web-traffic analysis, and brand valuation. Overall, it provides a measure of the amount of accessible brand-related online content - both official and third-party - and can also provide an indication of the likelihood of a brand being targeted by infringers. 

We have developed a simple methodology for quantifying the relative online prominence of brands, based on the concept of 'brand content score' (a metric for measuring the amount of brand-related content on an individual webpage), using a sample of the most highly-ranked search-engine webpages returned in response to relevant search terms. We have applied this principle to the top twenty fashion brands of 2023.

The analysis finds that the top five most prominent fashion brands are Nike, Gucci, Skims, Dior, and Burberry. Significantly, no strong correlation was found between online prominence and brand rankings (as measured by the Lyst Index), though this is perhaps unsurprising given that Lyst attempts to reflect the frequency of use of brand-specific searches by Internet users, and mentions and engagement on other platforms such as social media. By contrast, our measure of prominence is focused specifically on search-engine visibility, and the extent of brand-related content on the most highly-related webpages (i.e. factors more closely connected with SEO).

However, the methodology does provide meaningful insights in its own right, and extensions to the technique could be applied to produce deeper-dives of content relating to specific industry areas, and allow analysis of other characteristics, such as brand sentiment. 

This article was first published on 28 November 2023 at:

https://www.iamstobbs.com/opinion/finding-the-right-size-measuring-the-brand-prominence-of-fashion-brands-online

* * * * *

WHITE PAPER

Measuring online brand prominence: a proof-of-concept for the top twenty fashion brands

Introduction

The online prominence of brands can be a key metric for brand owners, and can serve as a data input for a number of areas, including search-engine optimisation and web-traffic analysis, and brand valuation. Overall, it provides a measure of the amount of accessible brand-related online content - both official and third-party - and can also provide an indication of the likelihood of a brand being targeted by infringers. 

In this study, we present an overview of a simple methodology for quantifying the relative online prominence of a selection of brands, using the top twenty fashion brands in 2023 (Appendix A) as a case study.

Methodology

The basic principle behind the methodology is to obtain a representative sample of webpages of potential relevance (e.g. to the business area of the brands concerned) and then determine the number and prominence of mentions of each of the brands of interest on each webpage, across the dataset.

One of the main points to note in this type of analysis is that it is necessary not to explicitly search for any of the brand names in question. The reason for this is that - by definition - for any given query submitted to a search engine, all of the results will relate to the search term being used. Even if the analysis considers all such results, by continuing to paginate through until no further results are returned, this will usually only return a maximum number of results (typically a few hundred) for any given search engine and query. If, therefore, we simply search for each brand name separately, this will yield a relatively consistent number of results for each brand, and the brands will artificially appear to have similar online prominences. Instead, it is preferable to use generic search queries to bring back sets of pages relevant to the industry area of the brands in question (or to business in general) and count the mentions of the brands (and measure their prominence in each case) which happen to appear in this overall representative sample of pages.

In this analysis, we consider six generic fashion-related search terms ('fashion', 'fashion brands', 'clothes', 'clothing', 'designer' and 'trends'), and analyse only the first page of results (approx. 100) from google.com in each case. The list of results is then de-duplicated (to account for the fact that some pages are returned by more than one of these queries), yielding a dataset of 545 webpages of high relevance to fashion brands, for the proof-of-concept analysis.

The next stage is, for each of the twenty brands under consideration, to measure the number and prominence of the mentions of the brand on each of the pages in the dataset. In general, prominence is determined by the type of context in which the brand is mentioned (e.g. in the URL vs. the page title vs. a level-1 or level-2 heading vs. any other mention on the page); this analysis is carried out by considering the full content of the HTML source-code of the webpage.

In earlier formulations of similar methodologies[1,2,3], the analysis was carried out simply by considering the numbers of pages on which there was at least one brand mention in each of the key areas of content (URL, title, etc,) on the page. However, this approach is somewhat unsatisfactory, as it fails to distinguish (for example) a page which mentions a brand once in its page title from one featuring multiple mentions in the title (and, correspondingly, actually has a greater degree of 'brand-related' content). In this study, we present an improved methodology, utilising the concept of a 'brand content score' for each brand on each page.

Brand content score

The brand content score (which can be calculated for a specific brand or keyword on any given webpage) is a useful metric in its own right, with general applications in a range of areas. Its calculation involves counting each mention of the brand on the page, and weighting each one according to its prominence - e.g. a mention in the URL 'scores' more highly than a mention in the page title, which scores more highly than a mention in a level-1 heading, and so on. In our formulation, we also have the option to 'cap' the contribution to the total score from each specific area, to avoid skewing the results by 'junk' pages which may be 'stuffed' with very large numbers of mentions of random terms.

The range of scores obtained by this analysis will be dependent on the relative weightings and data caps used, as well as the types of search queries used to generate the results and the keywords being matched.

In brand monitoring, brand content score can also be used as a basis for prioritising results - for example, when large numbers of webpages are identified using a monitoring tool. Those pages assigned the highest scores - i.e. those of greatest potential relevance to the brand in question - are typically the primary targets for further analysis, may be priorities for further monitoring (e.g. content tracking) and enforcement, and can provide insight into keywords and TLDs (domain extensions) most used in relevant content, which can help inform domain registration policies[4,5].

In this analysis, we calculate the brand content score for each brand under consideration, for each webpage in the dataset, and use the mean value across all pages for each brand as the basis for the comparison of the relative online prominence of the 100 brands.

Findings

The distribution of the full set of brand content scores, for all brands across all webpages in the dataset, is shown in Figure 1.

Figure 1: Distribution of all (non-zero) brand content scores, for all twenty brands and all webpages in the dataset

The vast majority of calculated scores across the dataset are relatively low (with a score of 10 or less calculated in 99.7% of cases), which is perhaps unsurprising in view of the fact that the set of pages is - by design - dominated by content about fashion generally, rather than comprising webpages about any of the relevant brands specifically. The highest score observed in the dataset was 76 (one instance), the brand content score calculated for the Skims brand, for a page on the official Skims website[6]. This was the only identified incidence in the dataset of a page from the official website of one of the brands under consideration being returned in response to one of the generic, fashion-related search queries used (returned as result #23 on google.com in response to the query-term 'clothing', and as #99 for 'clothes').

Overall, the relative online prominence of each of the top twenty fashion brands is expressed as the mean of the individual brand content scores for that brand, across all pages in the dataset (Figure 2 and Table 1).

Figure 2: Mean brand content scores (i.e. overall online prominence scores) for the top twenty fashion brands (using the set of 545 webpages from the proof-of-concept study)

Position
                    
Brand
                                     
Mean brand
content score
                                    
1   Nike 0.47
2   Gucci 0.40
3   Skims 0.24
4   Dior 0.23
5   Burberry 0.23
6   Balenciaga 0.19
7   Versace 0.19
8   Prada 0.18
9   Louis Vuitton 0.17
10   Saint Laurent 0.15
11   Loewe 0.14
12   Fendi 0.13
13   Valentino 0.11
14   Dolce & Gabbana 0.11
15   Moncler 0.09
16   JW Anderson 0.08
17   Diesel 0.07
18   Bottega Veneta 0.04
19   Miu Miu 0.03
20   Jacquemus 0.02

Table 1: Mean brand content scores (i.e. overall online prominence scores) for the top twenty fashion brands (using the set of 545 webpages from the proof-of-concept study)

The analysis gives the top five brands as Nike, Gucci, Skims (the only brand in the set for which a page from the official website was returned in the first page of Google results in response to at least one of the generic, fashion-related queries), Dior, and Burberry.

The data can be alternatively visualised in terms of the distribution of brand content scores across the dataset of webpages, for each brand (Figures 3 and 4).

Figures 3 and 4: Distribution of brand content scores for the top five (top) and bottom five (bottom) brands, by mean brand content score

As would be expected, the brands with the greatest prominence appear in general on more of the pages within the dataset, and have greater numbers of pages giving higher brand page content scores.

It is also informative to compare the overall mean brand page content scores (i.e. overall online prominence scores) against the positions of the brands in the original Lyst Index ranking (Figure 5).

Figure 5: Comparison of mean brand content score against Lyst Index ranking for the top twenty fashion brands

In general, there is no strong correlation between overall prominence and brand ranking (Lyst Index), with many of the higher-ranked brands having relatively low levels of prominence in the dataset. It is noteworthy that only one (Gucci) of the top six most prominent brands appears in the top ten of the Lyst ranking. The reason for this discrepancy is presumably because the two methods are using rather distinct methodologies, aiming to measure different brand characteristics; the Lyst Index aims to provide a more general measure of brand popularity, including the frequency of use of brand-specific searches by Internet users, and mentions and engagement in content on other platforms such as social media[7]. Conversely, the approach used in our analysis is more purely measuring brand prominence and visibility, making it more closely correlated with factors connected with search-engine optimisation.

Conclusions and discussion

The methodology described in this article represents a simple approach for comparing the online prominence of different brands, focusing on the most highly-visible online content (i.e. the webpages appearing near the top of the search-engine rankings).

The same approach can also be applied to more comprehensive studies, which could incorporate larger datasets of webpages, potentially drawn from a wider range of search sources targeting different geographical markets, and utilising as many relevant search queries as appropriate. In any study of this type, it is important to deal correctly with brand names which are relatively generic, to ensure that references are considered properly and avoid false positives. For example, a reference to 'Diesel' in a set of webpages pertaining to fashion is likely to pertain to the clothing brand; however, this may not be the case in a set of pages pertaining to business more generally. In such cases, it may be necessary to make use of keyword-based filtering to distinguish the relevant mentions from other uses of the brand name.

Furthermore, providing a consistent approach is used for any given series of studies, the methodology offers the potential for tracking trends and changes over time in relative prominence (without the need to 'normalise' the scores to a consistent baseline, as was the case in some earlier studies)[8,9,10], allowing factors such as the impact of marketing initiatives or news stories to be tracked.

It would also be possible to extend the brand content scoring approach to measure other characteristics of webpage content - for example, using the proximity and strength of sentiment-related keywords to the brand mentions in order to measure brand-related sentiment.

Appendix A: The top twenty fashion brands

The table below shows the top twenty fashion brands in 2023, according to the Lyst Index, together with the text (regular expression, or Regex) strings used in our analysis to identify a 'mention' of the brand.

Notes:

| denotes ‘or’ - so, for example, ‘gabbana|d&g’ will match ‘gabbana’ or ‘d&g’
.? denotes any one optional character
.* denotes any number of optional characters

No.
                
Brand
                                     
Detection string
                                                                
1   Prada   prada
2   Miu Miu   miu.?miu
3   Moncler   moncler
4   Valentino   valentino
5   Loewe   loewe
6   Bottega Veneta   bottega.*veneta
7   Dolce & Gabbana   gabbana|d&g
8   Versace   versace
9   Gucci   gucci
10   Saint Laurent   yves.*laurent|ysl
11   Dior   dior
12   Nike   nike
13   Louis Vuitton   vuitton
14   Diesel   diesel
15   Burberry   burberry
16   Fendi   fendi
17   Skims   skims
18   Balenciaga   balenciaga
19   Jacquemus   jacquemus
20   JW Anderson   jw.?anderson|jonathan.?anderson

References

[1] https://www.businessweekly.co.uk/news/hi-tech/9121-onlineresearch-gives-insight-damage-banks-brands

[2] https://www.trademarksandbrandsonline.com/news/luxurybrands-not-doing-enough-to-protect-themselves-online-4482 (cache available at https://webcache.googleusercontent.com/search?q=cache:9oyTNc1E1AwJ:https://www.trademarksandbrandsonline.com/news/luxury-brands-not-doing-enough-to-protect-themselvesonline-4482&sca_esv=580550388)

[3] 'The Digital Brand Risk Index: A NetNames Report'; PDF available at https://silo.tips/download/the-digital-brand-riskindex-a-netnames-report

[4] 'Technical aspects of brand monitoring', internal Stobbs training presentation

[5] https://www.iamstobbs.com/opinion/strategies-forconstructing-a-domain-name-registration-and-management-policy

[6] https://skims.com/collections/clothing

[7] https://www.lyst.com/data/the-lyst-index/q323/

[8] https://www.tyrepress.com/2011/09/michelin-still-toponline-brand-but-the-gaps-narrowing/

[9] https://www.tyrepress.com/2016/10/michelin-returns-to-thetop-of-online-brand-ranking/

[10] https://www.tyrepress.com/2017/09/michelin-tops-onlinebrand-prominence-table/

This paper was first published as an e-book on 28 November 2023 at:

https://www.iamstobbs.com/measuring-brand-prominence-of-fashion-brands-ebook

Friday, 17 November 2023

Trends in Web3 - Part 2: A deeper dive into the blockchain domain dataset

by David Barnett and Tom Ambridge

BLOG POST

Web3 - the general name given to the newest generation of decentralised Internet technologies (following on from the earlier phases of a landscape dominated by read-only, and then by user-generated, content) - continues to see new developments, in part pushed by growth in AI technologies driving a greater demand for privacy and non-restriction on content[1].

One key area is the realm of blockchain domains, Web3 domain names operating on the same underlying technology as cryptocurrencies such as Bitcoin. These domain names can be utilised for a number of purposes, including the construction of decentralised (peer-to-peer) websites, and wallet addresses for sending and receiving cryptocurrency.

In this study, we update earlier work looking at a set of 1.47 million (.eth) domains on the Ethereum blockchain (comprising a year's worth of registrations, and around one half of the total on this particular blockchain) to identify additional trends and patterns in the dataset, and indicators of brand infringements.

Amongst the main findings are the facts that:

  • Well-known trusted brands are heavily subject to infringement, in the form of blockchain domain names containing the name of the brand in question, or 'fuzzy matches', where one or more character is replaced by an alternative character which may appear visibly extremely similar, thereby presenting the potential for a highly deceptive lookalike name.
  • A new trend in blockchain domain name use appears to be emerging, in the form of instances where the domain name itself (which can incorporate a wide range of special characters) comprises an artwork - similar to an NFT - when the characters in the domain name are displayed in a suitable grid.

In response to these observations, brand owners are advised that:

  • It is crucial to be aware of developments in the new arenas of Web3, including blockchain domains and NFTs. Large numbers of short blockchain domain names are already taken, in addition to a wide range of others which may be infringing brand names. This highlights the requirement for brand owners to be early adopters for securing blockchain domain names of which they may wish to make use, and for monitoring activity on domains which are already taken.
  • The above risks may be further augmented by the emergence of a new trend in 'collectible' domain names, perhaps representing an evolution in the pre-existing ecosystem of domain 'clubs'. A related risk may turn out to be the emergence of blockchain domain names incorporating branded IP (imagery or logos) within the domain name itself. The identification of this type of infringement may require the development of new (potentially AI-based) image-recognition technologies.
  • Within the blockchain domain landscape more generally, new trends and risks might include:
    1. The possibility of naming collisions (arising from the same domain extension being used across multiple blockchains, or in both Web3 and Web2 (i.e. 'classic' domain name) contexts[2].
    2. Disputes over control of new extensions - both this and point (1.) are likely to be exacerbated by the launches of new providers, and of new extensions as part of the new-gTLD programme[3]. Brand owners wishing to apply for dot-brand extensions may need to be particularly mindful of these issues.
    3. Name wrapping, or the emergence of providers offering tradeable sub-domains of blockchain domain names.
    4. Decentralised Autonomous Organisations, entities managed via blockchain-based programs for tracking of business functions.

These last two points are discussed in more detail in reference [2].

References

[1] https://www.iamstobbs.com/opinion/trends-in-web3-part-1-a-look-at-blockchain-domains

[2] https://www.iamstobbs.com/opinion/the-iotex-case-domain-naming-collisions-and-other-emerging-risks-in-the-blockchain-ecosystem

[3] https://www.iamstobbs.com/opinion/the-new-new-gtlds

This article was first published on 17 November 2023 at:

https://www.iamstobbs.com/opinion/trends-in-web3-part-2-a-look-at-blockchain-domains

* * * * *

WHITE PAPER

Executive Summary

Blockchain domains are a key element of the Web3 ecosystem of technologies, enabling users to create decentralised websites and memorable cryptocurrency wallet addresses. Developments in artificial intelligence (AI) - particularly the scraping of data from social media platforms and the resulting restrictions in public access to associated content - may well drive a push towards increased adoption of Web3, with its emphasis on personalisation and lack of regulation and restriction. 

In this study, we take a deep dive into a dataset of 1.47 million (.eth) domains on the Ethereum blockchain, as a proxy for the overall blockchain domain landscape, to identify trends and patterns, and consider indicators of potential brand infringement. 

The main findings of the study are as follows:

  • Numbers of blockchain domain registration have dropped off following a peak in 2022, though continuing levels of activity are observed. The median length of the domain names (as measured by the SLD, or second-level domain name - the part of the name to the left of the dot) showed a dip between around August and October 2022, giving an indication that this period was associated with a 'golden age' of registration of shorter, desirable domain names. 
  • 99.8% of the blockchain domain names within the dataset have SLD lengths of 32 characters or fewer, but there is a 'long tail' of longer domain names, with the longest domains having lengths of over 38,000 characters. 
  • Many of these longer domain names (which can include a range of special characters such as emojis) appear to comprise artworks in their own right, with the domain names themselves appearing as images when the component characters are displayed as 'pixels' in a suitable grid. This observation is indicative of a new use-case for Web3 domain names, suggesting that they are being treated as collectible items similar to NFTs.
  • Overall, nearly 3,000 distinct characters were found to have been utilised in the blockchain domain names across the whole dataset. 
  • Amongst the other long domain names, several consist just of large numbers of repeated and/or non-sensical characters (as shown by their low Shannon entropy values in many cases). The exact purpose of these names is not clear, but it is possible that these types of blockchain domains are associated with domain 'clubs', associated with groups of collectible domain names with particular characteristics. 
  • Many hundreds of blockchain domain names featuring exact or 'fuzzy' matches (replaced, missing, additional or transposed characters) to the names of the top ten most valuable global brands were identified. These present serious potential for use in producing deceptive websites, or constituting instances of cyber- or typosquatting, and with some of the most potentially confusing examples including ạpple.eth, googߊe.eth, Microsoft.eth, amÉ‘zon.eth, and mcdönalds.eth (with non-Latin characters highlighted in bold).

Introduction

In the previous article in this series[1], we considered the ecosystem of blockchain domains as an example of an area of the newly-emerging Web3 landscape of decentralised technologies, and saw how - as in comparable areas of the traditional Internet - they can be associated with instances of brand infringements and be utilised for other types of cyberattack. However, unlike regular domains, the relative difficulties with monitoring and enforcement across the blockchain infrastructure means that blockchain domains may transpire to be a popular choice with bad actors. 

In the initial study, we considered a dataset of all .eth blockchain domains registered in the previous year (1.47 million domains, or 54% of the total set of active ENS domains), as a proxy for the overall blockchain domain landscape. In this follow-up, we provide an overview of a more detailed analysis of this same dataset, to identify trends and patterns in the registered domains, with a focus on potential indicators of brand infringement.

The Deep Dive

Deep Dive Part 1 - Trends over time in domain-name length

Our previous analysis showed that the numbers of new blockchain registrations have dropped off since a peak in 2022, potentially a reflection of this earlier period representing the timeframe within which the shorter-length available domain names were being registered by users and prospectors. In order to test this theory, we consider the mean length (in terms of the number of characters in the second-level domain name (SLD) - i.e. the part of the domain name before the dot) of the blockchain domain names registered each day, across the same period (Figure 1).

Figure 1: Mean SLD length of the set of daily .eth blockchain domain registrations (July 2022 - July 2023)

This analysis shows no meaningful trend, with the mean domain-name length within the daily sets of registrations not changing significantly across the one-year period. However, the dataset does include some very long domain names (see Deep Dive Part 2), which may skew the daily mean values. Accordingly, it may be more instructive to consider the median length of the domains registered each day (Figure 2).

Figure 2: Median SLD length of the set of daily .eth blockchain domain registrations (July 2022 - July 2023)

Whilst there are no significant overall variations between the start and end of the period, this analysis does more clearly show that the blockchain domains being registered between August and October 2022 had names which broadly were shorter than across the rest of the year.

Deep Dive Part 2 - Distribution of domain-name lengths within the full dataset

Figure 3 shows that the vast majority of blockchain domains registered across the one-year period have names which are relatively short, with 99.8% of the dataset having SLD lengths of 32 characters or fewer.

Figure 3: Distribution of SLD length in the full dataset of .eth blockchain domain registrations (July 2022 - July 2023)

However, there is a very long tail of longer domain names, with the longest domains in the dataset (three instances) having a SLD length of 38,894 characters, as shown in Figure 4.

Figure 4: Cumulative proportion of domains with SLD length less than or equal to the value shown on the horizontal axis, for the full dataset

Deep Dive Part 3 - Long domain names and domain-name entropy

As noted above, there are several domains in the dataset with extremely long names, with 14 instances of SLDs with 8,000 characters or more. The first twenty characters of each of the longest names in the dataset are shown in Table 1.

Table 1: First twenty characters in each of the top-fourteen longest blockchain domain names in the dataset

A number of the domain names consist largely or wholly of emojis or other special characters, and the dataset also includes examples in which the domain name itself appears to be an artwork in its own right. One example is shown in Figure 5, where the domain name - consisting of 1,024 coloured-square characters - takes on a special appearance when displayed in a grid of 32 x 32 characters. Further similar examples are shown in Figures 6 and 7.

Figure 5: A 1,024-character domain name from within the dataset, displayed in a grid of 32 x 32 characters

Figure 6: Other examples of domain names comprising artworks (with SLD lengths (in characters) of 357, 381, 410, 415, 551, 597, 621 and 4,759)

Figure 7: A group of five domain names with SLD lengths of 577 characters, apparently comprising a set of numbered artworks

The emergence of these 'artwork' domain names may be indicative of a new trend in the Web3 domain market more generally, moving away from the Web2-style use of domains (and even the core purposes of Web3 domains; namely the facilitation of asset transfers without the requirement for a wallet address, hosting of decentralised websites, and mail exchange and communication) into a more collectible nature as was seen previously with NFTs. These distinctions may be reflective of a lack of centralised governance for the blockchain domain landscape.

It is also noteworthy that the domain names shown in Figure 7 look very similar to (and may be infringing) the CryptoPunks NFT collection owned by Yuga Labs, who recently won a US lawsuit against artist Ryder Ripps in relation to the sale of NFTs infringing their Bored Ape Yacht Club mark[2]. This raises a more general question around a requirement by brand owners for technologies able to track and interpret graphical domain names - for example, where branded imagery or logos are represented.

It is interesting to note that, outside the set of domain names which constitute some kind of graphical design, many of the longest domain names consist of very long strings of repeated characters. This can be shown by comparing the domain-name length against the entropy ('Shannon entropy') of the SLD string (a measure of the amount of randomness, or information, contained within it)[3], as shown in Figure 8 and Table 2. The data shows that many of the longest names have very low entropy, or zero - as is the case when the SLD consists only of a single character repeated.

Figure 8: Scatter plot of SLD length against SLD Shannon entropy, for the full dataset

SLD length (characters)
                                                
Shannon entropy
                                           
38,894 0.000
38,894 0.000
38,894 0.000
37,058 4.225
33,792 2.000
32,745 0.000
32,713 0.000
29,425 0.000
11,111 0.000
10,245 0.000
10,000 0.000
10,000 0.000
8,613 6.216
8,000 0.000

Table 2: SLD length and Shannon entropy values for the top-fourteen longest blockchain domain names in the dataset (as referenced in Table 1)

Overall, the spread of entropy values within the dataset is shown in Figure 9. The highest-entropy domain name has a value of 9.081, for a name consisting of 929 special characters (with only a small number of repeated characters) (Figure 10).

Figure 9: Distribution of Shannon entropy values, for the full dataset

Figure 10: The highest-entropy domain name in the dataset

Deep Dive Part 4 - Distribution of characters within the full dataset

Within the full dataset, 2,979 distinct characters were found to have been used in the domain names. The number of instances of each of the top 50 most frequently occurring characters are shown in Figure 11, and the full set of the top-1000 characters (in descending order of frequency) is shown in Figure 12. 

Figure 11: Number of instances of each of the top 50 most frequently occurring characters in the dataset

Figure 12: Top-1000 most frequently occurring characters in the dataset (in descending order of frequency)

Deep Dive Part 5 - Domain names containing fuzzy matches to the top ten most valuable global brand names

In the first article in this series, we considered the numbers of the domains in the dataset containing the names of each of the top ten most valuable global brands in 2023[4]. In this follow-up, we extend this analysis of exact matches to also consider 'fuzzy matches' of the following types:

  • 'replaced character' - where any one character in the brand name is replaced by any other character
  • 'missing character' - where any one character in the brand name is excluded
  • 'additional character' - where any one additional character is inserted within the brand name
  • 'transposed characters' - where any pair of adjacent characters in the brand name are swapped

Numerous previous studies have noted that fuzzy-match domains are commonly used by infringers to create deceptive websites, with names which can easily be confused with the official website for the brand in question, or in other types of infringement or traffic misdirection[5]. In other cases, domain names which are a close match to those of an official brand owner, and which are actively being used by a third party, can constitute cases of potential brand confusion.

Table 3 shows the total number of domains containing each of the above types of fuzzy match, for each of the top ten brands, within the full dataset of .eth blockchain domains.

* As in Part 1 of the article series

Table 3: Total numbers of .eth blockchain domains registered between July 2022 and July 2023 with names containing exact or fuzzy matches to the names of each of the top ten most valuable global brands in 2023

In some cases - particularly where the brand name is particularly short or generic - some of the fuzzy-match types are not likely to be relevant to the brand in question, potentially instead relating to other terms or third-party brand names (for example, 'visa' will fuzzy-match for 'via', 'invisible', 'lisa', etc. and 'apple' will fuzzy-match for 'rapper', 'maple', 'ripple', 'apply', etc.). However, the dataset does include a significant number of examples of domain names which provide very significant potential for confusion with the name of an official site for the brand in question, including several cases where character has been replaced by a non-Latin character which appears visually very similar (a ‘homoglyph’) (Figure 13) - with some of the most concerning examples including ạpple.eth, googߊe.eth, Microsoft.eth, amÉ‘zon.eth, and mcdönalds.eth (where non-Latin characters are highlighted in bold). Whilst none of these domains currently resolves to any live site content (apart from one example which re-directs to an error page on the official Coca Cola website), they do - wherever under the control of third parties - present the risk for the subsequent addition of deceptive content, or may be speculative registrations intended to be offered for subsequent sale to the brand owner.

Figure 13: Examples from the dataset of potentially deceptive domain names featuring exact or fuzzy matches to any of the top ten most valuable global brand names

Discussion

There are several key observations to be drawn from this deep-dive analysis:

  • Whilst the overall numbers of registrations of .eth blockchain domains have dropped off since the earlier peak in 2022, there is some indication that the earlier flurry of activity was associated with the registration of shorter domain names, with a definite dip in median second-level domain-name (SLD) length amongst the daily sets of registrations prior to around October 2022. Subsequently, we see an ongoing continuation of registrations (albeit at lower absolute levels) with a median SLD length consistently of around 9 characters. 
  • The vast majority (99.8%) of .eth domains registered in the last year have SLD lengths of 32 characters or fewer. However, there is a long tail of longer domain names, with 14 instances in the dataset of domains greater than 8,000 characters in length, up to a maximum value of 38,894 characters. Many of these long domain names are non-sensical or consist solely of repeated characters. However, there is a subset of long domain names in which the domain name itself consists of special graphical characters which, when arranged in an appropriate grid, display digital art similar to examples sometimes sold as NFTs.
  • Outside of the set of 'artwork' blockchain domain names, the exact purpose of many of the very long domain names is not clear. It may simply be that these are part of a project by some users to build a portfolio of 'unusual' domain names. Alternatively they may be associated with domain 'clubs', consisting of groups of domain names featuring particular characteristics, which are collectible by virtue of their limited supply (e.g. '999 Club' - SLDs consisting of three digits; '10k Club' - domains with four digits; 'Single Ethmoji Club', etc.)[6]. In these cases, the registrations may have been carried out by bots which are configured to pre-emptively register relevant domain names. The presence in the dataset of the 'artwork' domains is further suggestive that many of the domains have been registered for their collectability and attractiveness to traders.
  • Many hundreds of names containing exact or fuzzy matches to the names of any of the top ten most valuable global brands appear in the dataset, significant numbers of which present very considerable potential for confusion with official brand sites. Whilst generally not currently found to resolve to any live site content, they present serious risk of being associated with deceptive content in the future, or may be under the ownership of cybersquatters, further highlighting the importance of brand owners maintaining an awareness of activity in this continually evolving space.

References

[1] https://www.iamstobbs.com/opinion/trends-in-web3-part-1-a-look-at-blockchain-domains

[2] https://news.artnet.com/art-world/yuga-labs-lawsuit-tradmark-bayc-ryder-ripps-2290175

[3] https://www.linkedin.com/pulse/investigating-use-domain-name-entropy-clustering-results-barnett/

[4] https://www.kantar.com/inspiration/brands/revealed-the-worlds-most-valuable-brands-of-2023

[5] https://www.cscdbs.com/en/resources-news/threatening-domains-targeting-top-brands/

[6] https://ens.vision/market

This paper was first published as an e-book on 17 November 2023 at:

https://www.iamstobbs.com/trends-in-web3-ebook

Wednesday, 15 November 2023

Can't stop the Grok: domain infringements following X's AI brand launch

On 4 November[1], Elon Musk's xAI corporation announced the launch of their new artificial intelligence chatbot, 'Grok'[2], named after a concept introduced in 1960 by author Robert Heinlein. Currently, the Grok product - which sources information from the content of postings on the X platform (formerly Twitter)[3] - is only offering early access to verified users.

As is almost inevitably the case following a new brand launch - particularly one which appeals to the followers of the current online buzz surrounding AI products - the announcement was almost immediately followed by a flurry of brand-related domain registrations, many of which resolved to live websites incorporating other types of brand infringements. In addition, large numbers of pre-existing domains containing the Grok name also suddenly became considered 'premium' commodities, commanding high prices on the domain brokerage market. An article from two days after the launch of the brand stated that the Grok name had already been taken (as a second-level domain name; the part of the name to the left of the dot) across 162 different domain extensions (TLDs)[4].

In this article, we take a deeper dive into the landscape, considering all domain names containing the Grok name (utilising wildcard, rather than just exact-match, searches), based on an analysis carried out five days after the brand launch (09-Nov-2023)[5].

As of 9 November, the analysis found over 5,700 domains containing 'grok', including pre-existing registrations. Of these, over 3,100 featured the brand name at the start - i.e. the domains most likely to be relevant to the brand, and excluding 'false positives' such as 'agrokemical', etc. Significant numbers of these also contained high-relevance keywords relating to AI or Web3 concepts, with common terms including 'ai', 'bot', 'gpt', 'chain', 'coin' and 'token'. Of the 1,292 domains beginning with 'grok' and registered in 2023[6], 1,078 (83%) were registered in the five-day period between 4 and 8 November (Figure 1).

Figure 1: Daily numbers of registrations of domains beginning with 'grok', since the start of October 2023

360 such domains were registered on 5 November; prior to one day earlier, the previous highest daily number of registrations during 2023 had been five. Even the apparent end of the 'spike' after 6 November may be an artefact caused by the fact that there is sometimes a delay of up to a few days between the registration of the domains and their addition to the respective zone file, so some of the most recent registrations may not be reflected in the analysis.

Overall, the peak in activity immediately following the launch is unsurprising, and follows similar trends seen previously for many other brand launches and news stories[7], where bad actors take advantage of the emerging brand 'buzz' to construct related infringements, carry out fraud, and misdirect web users to their own content.

Amongst the approximately 360 'grok' domains which also contain keywords deemed to be high relevance were a wide range already resolving to live website content, as of the date of analysis. Many of these had been monetised through the inclusion of pay-per-click links or resolved to pages explicitly offering the domain names for sale. Several of these were asking prices of several thousand dollars, with some of the most expensive including groknfts[.]com ($109k), grokcloud[.]com ($500k), and grokcoins[.]com ($889k). Within the set of other live websites, one of the most common themes was cryptocurrency-related content (e.g. sites offering AI-powered cryptocurrency exchanges or offering users the purported opportunity to 'invest' in Grok-related currencies) (see Figure 2). Overall, over 20 distinct sites relating to cryptocurrency were already found to be live within the dataset, many of which were making use of xAI branding. Many of these were hosted on 'niche' new-gTLD extensions (including .app, .live, .site, .space, .tech, .top, .vip, and .world), many of which have been noted previously as being popular with infringers[8,9].

Figure 2: Examples of cryptocurrency-related sites (potential fraud and/or infringements) hosted on 'grok' domains

Other examples of sites of potential concern include those which are themselves purporting to offer AI products (either Grok or otherwise) (Figure 3), and others distributing malware (Figure 4), offering Grok downloads, providing information on the product, and a smaller number of other instances of uses of the same brand name by what appear to be unrelated third parties.

Figure 3: Examples of websites hosted on 'grok' domains and offering their own AI products

Figure 4: An example of a malware site hosted on a 'grok' domain

Looking also specifically at the blockchain domain ecosystem (Web3 domains based on the same underlying technology as cryptocurrencies, and providing potential for the construction of decentralised websites)[10], the analysis on 9 November found that 25 .eth blockchain domains containing 'grok' had been registered via the ENS provider in the previous 30-day period[11], all of which had been registered since 5 November (many of which also contained crypto-related keywords). As of the date of analysis, none were found to resolve to any live website content, though they do present clear potential for fraudulent use in the future.

This analysis highlights the speed with which infringements can spring up following a high-profile brand launch, and shows the importance of a proactive programme of monitoring and enforcement for brand owners. The findings also provide a sobering illustration of the importance of general Internet users being wary of online content, particularly where it relates to new and potentially unfamiliar brands. 

References

[1] https://www.youtube.com/watch?v=faP1skdVoDQ

[2] https://grok.x.ai/

[3] https://www.theguardian.com/technology/2023/nov/05/elon-musk-unveils-grok-an-ai-chatbot-with-a-rebellious-streak

[4] https://domaingang.com/domain-news/grok-com-musks-grok-ai-bot-stirs-new-registration-craze-among-domain-investors/

[5] Analysis is carried out using zone-file data from ICANN's Centralized Zone Data Service (https://czds.icann.org/), covering over 1,000 gTLD extensions. All results were obtained using the versions of the zone files downloaded on 09-Nov-2023.

[6] Considering the set for which domain creation dates were available from an automated whois look-up

[7] https://www.iamstobbs.com/opinion/wilko-a-target-for-scams-following-administration

[8] https://circleid.com/posts/20230117-the-highest-threat-tlds-part-2

[9] 'Brand Protection in 2023: Trends, Challenges and Opportunities: Developments in Web3, AI and new-gTLDs', forthcoming webinar

[10] https://www.iamstobbs.com/opinion/trends-in-web3-part-1-a-look-at-blockchain-domains

[11] https://dune.com/queries/7507/14878

This article was first published on 15 November 2023 at:

https://www.iamstobbs.com/opinion/cant-stop-the-grok-domain-infringements-following-xs-ai-brand-launch

Tuesday, 14 November 2023

Shaping the future of online brand protection: interview with David Barnett, Brand Protection Strategist, Stobbs

by DomainCrawler

David Barnett is an expert online brand protection consultant and analyst. He has almost 20 years’ experience in the online brand-protection industry, serving clients across a range of sectors and industries. He started his career at Envisional in 2004, subsequently moving to NetNames (2007) and CSC (2016). Since July 2023, David has been working as Brand Protection Strategist at Stobbs. He is an experienced thought leader, with an extensive portfolio of articles and experience of speaking at industry events and is author of 'Brand Protection in the Online World'.

In preparation for the webinar 'Summarizing the year for online brand protection', we sit down with David to talk about the trends, challenges and opportunities in the sector.

Tell us a bit about your journey as an online brand protection expert. Based on your experience it seems that you have been working within this sector from the day it emerged. 

I started in the industry as a Brand Protection Analyst at Envisional, a small, Cambridge-based start-up, in 2004. At that time, the Internet was rather different, not least in terms of the much smaller numbers of online channels to be monitored. However, some of the types of infringement, and the methods used by bad actors, have persisted to the current day. Much of my work during this time has involved working on the configuration of Internet monitoring tools and analysing the results identified, so as to identify the findings of greatest potential concern to our clients. This has enabled me to keep a close eye on trends and threats in the overall landscape as they have evolved.  

What are the most prevalent threats to online brands today, and how have these evolved over the past years? 

In some senses, the threats to brands are the same as they have always been, and depend on the type of company. Issues such as counterfeiting (which is relevant to manufacturers of physical goods), fraud (relevant to financial service providers and other entities holding customers' personal details) and digital piracy (relevant to providers of content which may be distributed in digital form) remain significant areas of concern - since they result most directly in financial loss for the brand owner - even if the ways in these threats are manifested have changed over time. Really, much of the evolution of the Internet - such as the emergence and development of channels such as social media, mobile apps, etc. - has resulted in changes to the ways in which online crime can be carried out, even if the overall goals of bad actors (usually driven by financial gain) have remained similar.

Organisations will also be targeted through other types of brand infringement (traffic misdirection, false claims of affiliation, negative brand association, negative comment, etc.), although these overlap into areas where reputational - rather than directly financial - issues are at stake. However, this can of course also affect a company's financial performance, through damage to brand value and customer behaviours. 

At the current time, newer technologies such as those related to Web3 (notably cryptocurrency, blockchain domains and NFTs) and artificial intelligence are often foremost in brand owners' minds, even if the landscape here is relatively new and the nature and scale of the associated threats remains yet to be seen.     

Can you provide an overview of how the new gTLD programme works and why it's important for brands to be aware of it? 

The new gTLD programme initially launched in 2012, and essentially consists of the addition to the Internet of a wide range of new generic domain-name extensions (top-level domains, or TLDs). A little over 1,000 new extensions have been launched in the intervening 11 years. Many of these were intended to be descriptive of the type of website content, and to reduce customer confusion, although uptake has not been at as large a scale as anticipated, and in fact the new-gTLDs have been disproportionately utilised by the creators of fraudulent sites, partly due to low-cost registrations with relatively lax requirements. 

A new round of applications is set to launch in 2026, and it is essential for brands to keep a close eye on this space as it develops. Some organisations may wish to apply to run their own branded domain-name extension (a 'dot-brand'), which will give them full control over all domain applications over this TLD, although it is a fairly costly and technically-demanding project to do so. Failing that, brands will want to utilise blocking or alerting mechanisms where possible, and to monitor the landscape, to gain visibility of potentially-threatening third-party registrations as they appear. Companies may also want to consider defensive registrations across the new extensions, or taking advantage of the new TLD space for their website presence - especially as the availability of short unregistered domain names across traditionally popular extensions such as .com is already significantly limited. 

How has the introduction of new gTLDs impacted the domain name landscape from a brand protection perspective? What can we expect to see with the future introduction of the new top-level domains? 

Much of the risk is really that there are now more options available to would-be registrants of fraudulent domains, and the benefits of defensive registrations by brand owners are increasingly seeing diminishing returns. This is resulting in a landscape where proactive monitoring in real time is increasingly becoming the most effective way of targeting infringements. We have also seen that many of the TLDs from the first round are highly utilised by infringers, and the same may be true in subsequent rounds. It is also noteworthy that some of the new extensions have potential for security concerns in their own right, with examples such as .zip, which has the potential for abuse through confusion with the file-name suffix. 

How do you see the intersection of emerging technologies, like artificial intelligence and machine learning, as well as web3, shaping the future of online brand protection? 

One of the main issues is the fact that the development of AI - often involving the scraping of data from social-media and other platforms, and thereby driving an increased push towards a restriction of access to this content - may itself drive an increased adoption by users of Web3 technologies, which are inherently decentralised and have an emphasis on users being able to curate their own content and avoid regulation and restriction.  

AI itself raises a number of open questions, such as the legal implications and uncertainty over ownership of AI-generated content.  

AI itself is also likely to facilitate the production of fraud-related content, even if not necessarily changing the types of attack which are possible.  

Web3 developments may also much complicate the online landscape, through the appearance of a range of distributed channels which may be extremely difficult to monitor, combined with factors such as the risk of domain-name collisions (i.e. the same domain name potentially appearing on multiple different blockchains, or common to both Web3 and Web2 ('classic' domain name) content. We are also seeing the emergence of new types of threats, such as blockchain domain names comprising imagery or artworks - brand protection technologies will need to evolve in order to address these types of content. 

Overall, though, AI is also likely to bring benefits for brand-protection service providers, raising the possibility for much more sophisticated monitoring and enforcement technologies - such as improved capabilities for image recognition, video analysis, sentiment analysis, automated prioritisation of results, clustering analysis, learning of patterns in new infringement trends, and so on. 

Could you share some best practices or actionable steps for businesses, both large and small, to bolster their online brand protection efforts? 

Really, much of the top-level advice is the same as it has always been - specifically, ensuring that brand owners maintain an IP portfolio which is in good shape, and utilising this as the basis for a proactive ongoing brand monitoring programme of monitoring and enforcement, combined with a robust domain security posture. It is also essential to keep on top of new trends as they emerge, which is where partnering with a strong brand-protection service provider can help.  

Other factors which have traditionally been seen as good practice also continue to hold true - engaging with customers, encouraging them to serve as brand 'ambassadors' and report infringements, responding positively to negative feedback without taking an overly heavy-handed approach to enforcement, operating with clarity about the use of official online channels, utilisation of product verification and tracking technologies, and so on, will always be beneficial.

This article was first published on 14 November 2023 at:

https://domaincrawler.com/shaping-the-future-of-online-brand-protection/

Unregistered Gems Part 6: Phonemizing strings to find brandable domains

Introduction The UnregisteredGems.com series of articles explores a range of techniques to filter and search through the universe of unregis...