Thursday, 25 September 2025

Revisiting the calculation of brand protection return-on-investment

by David Barnett, Richard Ferguson and Sheena Yonker

August 2025 saw the publication of INTA's Anticounterfeiting Committee Policy Global Project Team report, 'Anticounterfeiting and Return on Investment'[1,2], considering the issue of the calculation of return on investment (ROI) for brand-protection initiatives (with a specific focus on counterfeiting activity).

Whilst much of the content echoes previous research, the new report does describe some of the relevant factors in slightly different ways, thereby providing some alternative frameworks which can be used to consider relevant ideas. The publication of the report therefore offers a suitable opportunity to revisit and expand on some of the key points relating to the general subject area. 

Pre-existing research and concepts 

In general, the ability to quantify ROI for brand protection initiatives is a long-standing requirement in many organisations, frequently associated with a desire to demonstrate success and secure funding for future similar projects.  

The key ideas necessary for constructing calculation frameworks have been outlined in a range of prior pieces of research, with some of the most significant points[3,4] outlined below. 

  • For e-commerce marketplace enforcement specifically, ROI is most traditionally assessed by considering the numbers of infringing items removed through takedown actions, applying a customer substitution rate (i.e. the assumed proportion of customers who will buy a legitimate item once the infringing version is made unavailable - and potentially also applying an additional conversion factor to account for the proportion of available items which translate to a sale), and then using the legitimate item price to estimate the increase in profit generated through additional legitimate sales as a result of the enforcement. 
  • Modifications or enhancements to the above simple approach can be made by considering:
    • Variability in substitution rates; specifically, the effects of e.g. item price (either absolute, or differential between infringing or legitimate goods), or the degree of deception involved in the sale of the infringing items. 
    • Use of 'data caps' to prevent unrealistically high calculated ROI values. 
    • The long-term impact of the enforcement programme, rather than just basing the calculation on the short-term numbers of ongoing enforcements carried out on a regular basis - i.e. considering the scale of the infringement landscape as compared with that observed at the start of the service, before active takedowns were being carried out. Related to this idea is an assessment of how 'clean' the search results are (for infringing vs. legitimate items), or the degree to which the brand owner is able to achieve 'ownership of the buy button' - i.e. be the top-listed seller in response to searches for relevant product terms. 
    • The impact of the brand protection initiative on brand value - essentially, considering intangible (though still quantifiable) impacts associated with factors such as consumer brand awareness, loyalty / churn, and reputation. Also relevant in this type of determination are areas such as reductions in the cost of capital for the brand owner (i.e. perceived brand risk), and in operating costs required to address ongoing issues. 

The importance of considering additional factors, such as 'real-world' impacts (e.g. increases in numbers of visitors to physical stores, or information from 'on-the-ground' actions such as raids and seizures), direct monetary gains (such as proceeds from fines or damages arising from successful legal actions), and volumes of infringements avoided in the future as a result of a proactive brand protection programme, has also been noted.  

However, it is also clear that there is no 'one-size-fits-all' approach to ROI calculation, and any framework will have a number of associated caveats, such that the algorithms are really most appropriate only for like-for-like comparisons. 

Additional ideas from the new report 

INTA's new report stresses the idea of considering brand protection as a strategic business imperative, and also reiterates the importance of taking a conservative approach (i.e. to prevent unrealistic estimates which can affect the credibility of ROI assessments).  

The report also emphasises the importance of integrating online and offline components, particularly in investigative strategies (e.g. to identify relevant physical locations and supply chains). This idea is also familiar from previous research looking at activity 'hotspots'[5,6,7,8], which can allow efforts to be focused in key locations, in a cost-effective way. Also previously noted is the importance of data analysis in establishing the existence of clusters of infringements associated with individual high-volume key infringers, to better inform where enforcement should be targeted[9,10]. Additionally, in brand protection programmes, offline measures can include raids and seizures, and a range of legal proceedings.  

It is also noted that careful monitoring of relevant online channels can provide early warning of new threat types, and insights into market trends. Furthermore, impacts resulting from brand protection efforts can potentially also similarly be assessed, by monitoring for factors such as changes in volumes of customer complaints.  

Furthermore, the implementation of initiatives such as product tracking or verification measures can, as well as contributing to securing the supply chain and reducing the ease of counterfeiting in their own right, result in a positive effect on perception for brands seen to be protecting their customer base and leading in innovation. Brand protection initiatives can also fulfil compliance requirements and mitigate legal risk.  

Also explicitly noted is the distinction between 'soft' and 'hard' ROI - essentially, the difference between the assessment of potential benefits and 'direct' monetary gains (e.g. through fines or damages) outlined above. One key point made in the report is that the assessment of 'soft' ROI is made particularly difficult by the fact that the nature and scale of significant proportions of infringing activity is frequently unclear (i.e. 'illicit trade obscurity').  

It is stressed that the aim of brand protection is ultimately to facilitate legitimate product sales, in an ecosystem where counterfeiting is currently associated with an annual economic cost of almost half-a-trillion dollars[12]. The report outlines some possible ROI calculation frameworks, including a 'substitution rate' approach for marketplace enforcement, similar to that outlined above, in addition to other simple calculation methods for assessing ROI from physical seizures, and from settlements achieved through legal actions. 

A final key point to note is the importance of adopting a cross-functional collaborative approach to brand protection within organisations, together with regular processes of review and strategy adaptation. 

Further thoughts 

Overall, we echo the assertion that an idealised framework for assessing ROI for brand protection should capture issues relating to consumer sentiment, as well as just (hard and soft) financial recovery - and, even just considering explicit monetary impacts, it is worth reiterating the point that any simple mathematical formulation will generally only provide part of the picture. Beyond this area, an ability to track complaints, reviews and social sentiment - both before and after enforcement - can provide more comprehensive insights, though this requires adequate technical capabilities from the brand protection service provider.  

The early phases of product development and launch are often a key pressure point for brand owners. Counterfeits often appear quickly, and therefore the capability to measure the rate of appearance of infringing goods, in addition to quantifying takedown speeds and tracking reductions in customer complaint volumes within these initial stages, can help to make ROI more tangible. Similarly, intensified bursts of online and offline enforcement during key marketing campaigns may also be appropriate. As noted in the INTA report, internal collaboration within an organisation is key, and should include input from (at least) sales, marketing and social management functions. 

Within companies, boards are also likely to expect to see demonstration of online vs offline cost efficiency. The construction of a simple metric (along the lines of 'cost-per-pound-of-protected-value') could help explain spend allocation. This determination may not be straightforward, however; for example, it is often the case that civil litigation often retrieves only limited financial proceeds, and can (in isolation) appear to be associated with a negative ROI, though additional factors (such as deterrent effects to future infringements) should also be considered.  

Crossover into the offline space is generally an important part of any brand protection initiative, but often comes with its own difficulties. For example, it may be difficult to achieve sufficient collaborative input from law enforcement agencies, unless case evidence is well established and clear cut. Offline efforts may also be made more difficult in cases where relevant IP rights have not been established. However, raising public awareness of infringement 'hot spots' (in addition to providing guidance regarding the dangers of counterfeits and how to spot them) can also be advantageous. 

In general terms, both online and offline measures need to be properly accounted for, in a joined-up fashion, in any comprehensive assessment of ROI. One key element is the use of effective case-management systems which are able to incorporate insights from the full extent of the brand owner's supply chain, using data on levels of infringements throughout (including customs seizures, levels of online saturation, and so on). Going forward, it would be extremely instructive to see a range of anonymised case studies, ideally involving data sharing across key industry bodies (INTA, IACC, ACG, etc.), so that appropriate ROI benchmarks could be created. 

As a final point, it is also important to note that the discussion in this article has focused primarily on counterfeiting specifically. This is of course only one of the areas a brand protection programme should address, and which include issues as diverse as phishing, malware distribution, and executive impersonations. Even just considering the sale of physical goods, other areas of concern - such as the trade in 'grey goods' (i.e. official items, but distributed through unapproved channels) - must generally also be addressed. 

References

[1] https://www.inta.org/news-and-press/inta-news/new-inta-report-offers-guidance-for-measuring-roi-in-anticounterfeiting/

[2] https://www.inta.org/wp-content/uploads/public-files/advocacy/committee-reports/2025-ACC-COMMITTEE-REPORT-081825.pdf

[3] https://www.iamstobbs.com/opinion/brand-protection-return-on-investment-an-overview-of-calculation-frameworks-and-methodologies

[4] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 11: 'Quantifying brand protection return-on-investment'

[5] https://www.iamstobbs.com/opinion/tracking-the-uk-trade-in-fakes-counterfeit-hotspots

[6] https://www.iamstobbs.com/opinion/tracking-the-uk-trade-in-fakes-ins-and-outs

[7] https://www.iamstobbs.com/opinion/think-globally-act-locally-an-overview-of-infringement-hotspots-around-the-world

[8] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 15: 'Links to offline data'

[9] https://circleid.com/posts/braive-new-world-part-1-brand-protection-clustering-as-a-candidate-task-for-the-application-of-ai-capabilities

[10] https://www.iamstobbs.com/insights/e-mail-address-extraction-from-webpages-a-quick-case-study-in-result-clustering

[11] https://www.iamstobbs.com/insights/further-explorations-in-clustering-use-of-google-advertising-tracking-links

[12] https://www.oecd.org/en/publications/mapping-global-trade-in-fakes-2025_94d3b29f-en/full-report.html

This article was first published on 25 September 2025 at:

https://www.iamstobbs.com/insights/revisiting-the-calculation-of-brand-protection-return-on-investment

Thursday, 28 August 2025

WyrdBrandz: Misspelling in brand names - is it an effective marketing tactic?

Introduction

In the modern world of branding, consumers are becoming increasingly familiar with the use of 'wacky' misspellings (sometimes described as 'sensational spellings'), with familiar examples including Flickr, Reddit, Tumblr, and many more. 

There are many reasons why this approach to selecting a brand name can be an effective one, including the facts that these types of misspellings can attract customer attention and aid memorability, and can make it easier obtain a protective trademark and secure an available domain name (as compared to the use of a dictionary word or proper name, for example). In conflict with these considerations, however, are the facts that misspellings generally are understood[1] to generate adverse brand attitudes (including questions regarding brand sincerity) and are frequently perceived as a marketing 'gimmick'. 

A deep analysis of the wider world of consumer attitude to 'misspelled' brands has recently been published by researchers from the Universities of Arkansas and Tennessee[2,3]. The study finds that the negative impact associated with the use of a minor misspelling can be offset through the careful choice of a particular spelling (or other branding features) to aid with brand-name interpretation by consumers. These types of insight are key to success in the areas of brand selection and marketing and, by extension, to considerations such as domain name registration (an area which was covered in previous work, providing overviews of techniques applicable to the discovery of available unregistered 'brandable' domain names[4,5,6]).

It is also important to note that the types of misspelled names considered here are completely distinct from other cases involving the deliberate use of deceptive (brand- or domain) names similar to those of third-party established trusted brands, for the purposes of impersonation and fraud[7,8].

Definitions and a deeper dive

The study of consumer reactions to misspellings in brand names is based fundamentally on the concepts of linguistic fluency (i.e. the ease of processing written content into language) and conceptual fluency (the ease with which associated meaning is brought to mind). Central ideas include the propositions that, overall, consumers process names more effectively when they are more similar to familiar terms (such as dictionary words), and that greater processing fluency tends (in general) to lead to improved brand perception. Lower degrees of 'orthographic' similarity to familiar strings can also be counteracted (to a degree) through the use of phonetic cues (i.e. those which are apparent when the brand name is sounded out) to the intended meaning. 

A core idea within this overall framework is that the use of only minor misspellings can have a small or negligible negative impact on brand perception, which can then be more than compensated through the clever application of other marketing-related mediating factors. This approach thereby allows the brand owner to take advantage of the other attractive features of misspellings, such as greater trademark and domain name availability.

As part of the analysis presented in the recent study, the types of misspellings used by brands were categorised into a set of 'types' - strikingly, together accounting for over half of all names in a curated list (from 2023) of 100 recent start-ups, which was used as an example dataset in the study (aside from only 29% which used 'correct' spellings, and another 20% using wholly new terms (neologisms) or proper names).

The categories of brand misspellings, as defined in the study (and illustrated with examples taken from it) were:

  • Compound - combining words together by (just) removing intervening spaces (e.g. 'AutoZone')
  • Lengthening - adding or repeating a letter (e.g. 'Mixx')
  • Foreign - substituting a character with one from an alternative language (e.g. 'Røde')
  • Letteronym - substituting a letter for a word or part of a word (e.g. 'La-Z-Boy')
  • Portmanteau - blending two or more other words (and removing parts of the component words), to create a new term (e.g. 'Duracell')
  • Phonetic - using a misspelling pronounced in the same way as the ‘intended’ term (e.g. 'Froot Loops')
  • Abridgment - shortening the term through the removal of one or more letters (e.g. 'Crumbl')
  • Alphanumeric - substituting a numeral for a word or part of a word (e.g. '4ward')
  • Leetspeak - substituting a numeral or special character for a visually similar letter (but with no modification to pronunciation) (e.g. 'E11EVEN')

Another key idea from the study is the fact that not all types of misspelling elicit similar responses; some decrease the 'processing fluency' of readers / consumers more than others - for example, abridgements, alphanumerics, and ‘leetspeak’ are generally found to be harder to process than compounds and lengthening. More predictably, fluency was also found to be decreased further in cases where there are higher degrees of misspelling (of a particular type). Other factors are also important, such as the effect on processing fluency of the proximity of an 'incorrect' character to the start of the word. This concept is related to a similar idea which is familiar from previous work on mark similarity measurement[9]

Counteracting the adverse effects of a misspelling may be other factors which can aid in conceptual fluency for the brand name in question, such as similarity to other familiar terms with well-understood meanings, the use of additional visual cues in the associated brand presentation, or the alignment of the spelling with characteristics such as the owner’s name, or the product or business type (e.g. a use of 'Quik' for 'quick', invoking an association with the underlying sentiment (quickness) through a shortening of the word; the use of the 'oo' for 'Froot Loops', resembling the shape of the actual cereal; or the use of a term such as 'Scentimental' for a brand in a relevant product area, such as a florist). These types of enhancement are found to positively influence brand attitudes, level of brand preference, and favourability to word-of-mouth recommendations, and can be particularly effective if other subjective criteria are also met (such as through the use of a misspelling considered to be 'fun' or 'cute'). 

However, there are likely also to be other factors which must be considered, such as whether the industry area of the brand in question is traditionally perceived to be associated with trust or accuracy (for example), in which case a misspelled brand name may be deemed less acceptable.

Conclusions

An understanding of the nature of, and of consumer reactions to, misspellings in brand names, can be a key component of an effective marketing strategy, particularly when a new brand name is being selected. 

The use of a misspelling can be a compelling solution to problems associated with pre-existing IP rights and poor domain name availability, but can have an adverse effect on brand perception. 

However, it appears to be the case that an optimum approach can be the selection of a 'minor' misspelling, combined with the use of other tactics to counteract its potentially negative impact (in terms of ease of word processing and customer perception). It is also worth noting that the 'degree' of misspelling can be quantified using previously explored ideas on mark similarity measurement.

Examples of successful approaches to improve the effectiveness of a candidate brand name might include the careful selection of a particular misspelling, or the use of appropriate visual branding elements, to convey other aspects of the intended brand values or message.

References

[1] J.P. Costello, J. Walker and R.W. Reczek (2023). "Choozing" the Best Spelling: Consumer Response to Unconventionally Spelled Brand Names. J. Marketing, 87 (6), pp. 889-905, https://doi.org/10.1177/00222429231162367. (Available at: https://journals.sagepub.com/doi/abs/10.1177/00222429231162367)

[2] L.W. Smith and A. Abell (2025). The Art of Misspelling: Unraveling the Diverging Effects of Misspelled Brand Names on Consumer Responses. J. Consumer Research, ucaf020, https://doi.org/10.1093/jcr/ucaf020. (Available at: https://academic.oup.com/jcr/advance-article-abstract/doi/10.1093/jcr/ucaf020/8106524)

[3] https://domainnamewire.com/2025/08/20/new-research-reveals-which-misspelled-brand-names-work-best/

[4] https://www.linkedin.com/pulse/overview-brandable-domain-name-discovery-techniques-so3ye/

[5] https://circleid.com/posts/20240911-further-explorations-in-brandable-domain-names-sensational-spellingz

[6] https://circleid.com/posts/availability-analysis-of-brandable-variant-string-domain-names

[7] https://www.iamstobbs.com/opinion/you-spelled-it-wrong-exploring-typo-domains

[8] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 7: 'Creation of deceptive URLs'

[9] https://www.linkedin.com/posts/dnbarnett2001_measuring-the-similarity-of-marks-activity-7331669662260224000-rh-R/

This article was first published on 28 August 2025 at:

https://www.iamstobbs.com/insights/wyrdbrandz-misspelling-in-brand-names-is-it-an-effective-marketing-tactic

Thursday, 14 August 2025

Further explorations in clustering - use of Google advertising tracking links

Part of the 'Patterns in Brand Monitoring: Brand Protection Data is Beautiful' series of articles[1,2,3,4]

Introduction

'Clustering' in brand protection is the process of discovering features shared in common between distinct findings (such as websites), as a means of establishing a connection between those results. In general terms, this type of analysis is beneficial as it allows for the identification of the most significant infringements (i.e. those associated with extensive networks of activity) and can provide investigative insights into the underlying entity(ies) (i.e. the owners / administrators of the content in question).   

Our previous discussion on 'clustering' analysis considered the case of e-mail addresses as a potential basis for establishing a link[5], and it is similarly possible to use other features, such as telephone numbers (though this is made more complicated by the wide range of formats in which the details can be formatted) or hyperlinks to (for example) associated social-media pages. It is also worth noting that the general process of establishing data clusters is one compelling potential application for AI functionality, theoretically able to address issues such as being able to interpret data which may be presented in a wide range of different ways and contexts[6].

In this new article, we consider the case of Google tracking links as a suitable feature for establishing connections between websites. The analysis in this study is based specifically on the Google Tag Manager system, which is used for functionality relating to website tracking and marketing / advertising, and utilises links incorporating identity ('ID') codes unique to the account of the owner of the website[7,8]. Previous analysis has established that many infringers tend to utilise the same ID code across large numbers of sites under their operation, to monitor the performance of their portfolio, rather than using a unique code / Google account for each site. Accordingly, the ability to identify the same tracker code on multiple different sites provides a means for a definitive determination of a connection between these sites to be established.

The analysis consists of utilising 'scraper' functionality to identify these links in the HTML source code of websites of potential interest (for those examples utilising the Google tracking functionality) and extracting the user-specific tracker-ID codes from them. We consider links of the general form googletagmanager[.]com/***?id=XXXXX (where '***' is an arbitrary string of characters, and 'XXXXX' is the tracker ID-code, written as an alphanumeric string).

Furthermore, open-source databases such as that offered by publicwww[.]com make it possible to carry out wider searches for other appearances of the same code, and therefore build a bigger picture of infringer activity.

Analysis

The analysis considers the same set of around 4,500 websites considered in the previous article; these are brand-specific domain names resolving to live web content, pertaining to particular a fashion brand.

In this new study, Google tracking links were identified on around 900 of the sites in question, and 50 of the identified tracking-code IDs were found on multiple (i.e. more than just one) sites, thereby providing criteria for establishing clusters. 

Upon deeper analysis, certain clusters turn out not to reveal any significant insights - for example, one of the tracking codes, which actually appears on 167 distinct sites, seems just relate to a particular web-hosting service provider (whose parking page appears in association with many of the domain names in question), rather than actually pertaining to the underlying website owners

However, many of the clusters do seem to reveal meaningful links, such as a group of 14 sites (the largest other cluster in the dataset) all featuring the same tracking code, but which would not otherwise easily have been known to be linked (Figures 1 and 2). Information from publicwww[.]com shows that this same code actually appears on over 232,000 distinct websites across the wider Internet.

Figure 1: (Redacted) examples of websites from the cluster of 14 all determined to be linked by virtue of the use of the same tracking code

Figure 2: (Redacted) website source code snippet present on all sites shown in Figure 1

In order to carry out deeper dives into the data, the dataset can be processed in a range of different ways to reveal and visualise the nature of the clusters. One convenient first step is the production of an 'adjacency matrix' (Figure 3) for the sets of sites (vertical axis) and distinct tracking codes (horizontal axis) in the dataset, in which a row/column intersection is marked with a '1' (red highlighting) if the code appears on the site in question, and '0' otherwise. Even from this raw data, some insights can be drawn, such as the identification of the large cluster associated with the tracking code shown third from the right in the screenshot, for which many of the row entries (corresponding to distinct associated websites) are highlighted in red.

Figure 3: Screenshot of the 'adjacency matrix' for the distinct sites and tracking codes present within any of the clusters in the dataset

This matrix can then be used as the basis for creating further visualisations of the data. For example, a number of standard Python libraries[9] can be used for the creation of visual 'networks' showing the connections within the dataset (Figures 4 and 5). These types of clusters show us that the websites in question in each case (represented by the nodes in blue) are all associated with each other, and could potentially be addressed in single bulk enforcement actions, thereby building efficiencies into the takedown process.

Figure 4: (Obfuscated[10]) visualisation of the cluster of 14 sites from which the examples in Figure 1 were taken (websites shown as blue nodes, tracking codes as green nodes)

Figure 5: (Obfuscated) examples of other clusters within the dataset which are of particular interest because of the presence of multiple interconnections between the sites / tracking codes in question (websites shown as blue nodes, tracking codes as green nodes)

Conclusion

The concept of clustering is a key component of the analysis process for websites and other results identified through a programme of brand monitoring. As part of a holistic brand protection initiative, it can help identify key infringers for prioritised action and enforcement, and help identify other linked content, through which a fuller picture of the underlying entities and their associated activities can be established.

The use of Google advertising tracking codes is a compelling basis for identifying connections, as they are generally specific to a particular user, are frequently utilised across multiple different sites in the portfolio, appear to be relatively ubiquitous across web content generally, can be readily extracted from the source code of webpages, and can often be tied to additional related material through the use of insights drawn from open-source databases.

References

[1] https://www.linkedin.com/pulse/brand-protection-data-beautiful-david-barnett-c66be/

[2] https://www.linkedin.com/pulse/brand-protection-data-still-beautiful-part-1-year-domains-barnett-juwhe/

[3] https://www.linkedin.com/pulse/brand-monitoring-data-niblet-5-law-firm-scam-websites-david-barnett-ap5de/

[4] https://www.iamstobbs.com/insights/notorious-ip-addresses-and-initial-steps-towards-the-formulation-of-an-overall-threat-score-for-websites

[5] https://www.iamstobbs.com/insights/e-mail-address-extraction-from-webpages-a-quick-case-study-in-result-clustering

[6] https://circleid.com/posts/braive-new-world-part-1-brand-protection-clustering-as-a-candidate-task-for-the-application-of-ai-capabilities

[7] https://support.google.com/tagmanager/answer/6102821?hl=en

[8] https://www.analyticsmania.com/post/google-tag-manager-vs-google-analytics/

[9] The figures in this study utilise the Python libraries NetworkX (https://networkx.org/; A.A. Hagberg, D.A. Schult and P.J. Swart (2008). "Exploring network structure, dynamics, and function using NetworkX". In: Proceedings of the 7th Python in Science Conference (SciPy2008), G. Varoquaux, T. Vaught and J. Millman (Eds.) (Pasadena, CA USA), pp. 11–15.) and Matplotlib (https://matplotlib.org/). 

[10] In the visualisations of the clusters, the brand name (as it appears in the domain names) is replaced by the string '[brand]', and an encoded ('hashed') form of the tracking codes (which generally exist in the raw data in the form 'GTM-XXXXX', 'G-XXXXX', 'AW-XXXXX' or 'UA-XXXXX', where 'XXXXX' is an alphanumeric string) is shown in each case.

This article was first published on 14 August 2025 at:

https://www.iamstobbs.com/insights/further-explorations-in-clustering-use-of-google-advertising-tracking-links

Monday, 4 August 2025

E-mail address extraction from webpages: a quick case study in result 'clustering'

Introduction

The concept of result 'clustering' - that is, the ability to establish connections between online brand monitoring findings not previously known to be linked - has been discussed previously as a key element of the analysis process in brand protection. 

It can allow the identification of key targets for further investigation or enforcement, and assist in building a fuller picture of the identity and activities of the entity(ies) behind the web-content in question, as part of an open-source intelligence (OSINT)-style investigative approach[1,2,3].

In this article, we focus specifically on the case of e-mail addresses as the data points on which clustering analysis can be based. The presented findings are derived from a process of data analysis involving the automated extraction of contact e-mail addresses from a series of webpages of potential interest, and the associated discussion shows how insights can be derived from the dataset.

Analysis

The dataset used in this case study is a set of domains of potential interest to a fashion brand, as identified through analysis of domain name zone files, which are data files containing the names of all registered domains across each TLD (top-level domain, or domain extension). The search was run using an analysis script configured to identify all domains containing the name of the brand in question, thereby simulating the process of collection of results by a full formal automated domain-monitoring service. 

For the brand under consideration (the name of which has simply been replaced, for confidentiality, by the string '[brand]' in all examples which follow), the initial searches generated over 16,000 brand-specific domain names of potential interest. Simple analysis techniques (as discussed in previous articles) can be used to carry out an initial stage of filtering and prioritisation of these results, to identify those sites most likely to be of interest. These techniques might typically include the calculation of 'risk scores' based on characteristics of the domains themselves, or of the content of any associated websites (in cases where a live site is present)[4,5]. This initial analysis allowed the production of a focused sub-dataset of around 4,500 domains most likely to be of greatest interest to the brand owner in question, based on the presence and prominence of the brand name and associated relevance keywords in the domain name itself and/or on the associated website.

The basic step of the subsequent analysis was to inspect the (HTML) content of each of the domains from the prioritised subset and (using an automated script) extract from the page any text-string(s) matching the format of an e-mail address (where present), with a view to identifying any contact addresses cited on each of the sites, and thereby identify any commonalities or similarities in usage.

At least one e-mail address was identified in the content of just over 1,000 of the sites in question (focusing specifically on the homepages of the sites in each case). The analysis focused on those e-mail addresses in which the 'host' part of the e-mail address (i.e. the part after the '@') was different from the domain name of the particular website on which the e-mail address was identified (deemed to be 'site-specific' contact details). 

The most obvious links which can be established are those cases in which the same e-mail address was found to be used on more than one distinct site in the dataset, which may otherwise not obviously have been known to be linked. 

In some of these cases, the distinct sites on which a particular e-mail address was found were themselves found to share a common SLD (second-level name, i.e. the part of the domain name to the left of the dot), such that it would have been relatively straightforward to establish a link even in the absence of the common e-mail address. Some such examples from the dataset (with domain names and e-mail addresses obfuscated in each case) include:

  • [brand]bag.vip and [brand]bag.store - e-mail address: camarendale9XXX[at]gmail.com
  • [brand]vix.com and [brand]vix.shop - e-mail address: ryanmi0XXX[at]gmail.com

However, in other cases, the common e-mail address may be the only basis on which a link between the sites in question could easily be established, e.g.:

  • my[brand]photos.com and omaha[brand].com - e-mail address: whatsyour[brand][at]gmail.com (Figure 1)
  • i-[brand]lightingonline.com and [brand]malls.com - e-mail address: 2853583XXX[at]qq.com 

Figure 1: Screenshots from two sites found to be linked on the basis of the use of a common e-mail address

In other cases, the 'host' part of the common e-mail address may also reveal the identity of an additional domain name which is linked to the first two, e.g.:

  • art-[brand].com and art[brand]dz.com - e-mail address: contact[at]art[brand].com
  • XX[brand]nails.eu and XX[brand]usa.com - e-mail addresses: helpdesk[at]XX[brand]nails.com and james[at]XX[brand]nails.com
  • e-casa[brand].com and casa[brand]contract.gr - e-mail address: info[at]casa[brand].gr
  • [brand]zeitde.com and [brand]zeitde.shop - e-mail address: info[at][brand]zeit.com
  • [brand]tailorhk.com and [brand]tailors.com - e-mail address: [brand][at][brand]tailor.com
  • n-[brand].com and n-[brand].net - e-mail address: care[at]usaglobalXXX.org
  • ceramica[brand].com and ceramica[brand].it - e-mail address: info[at]gruppobarXXX.com
  • [brand]movies.com and [brand]sf.com - e-mail address: 94115adam[at]cinemaXXX.com

It may also then be possible to determine further information on the underlying entity, by carrying out further searches for other online references to the common pieces of information (i.e. OSINT research). It is, however, worth noting that some e-mail addresses appearing on multiple sites may simply relate to (say) a particular service provider which just happens to have been used by the owners of each the websites in question, but where the sites themselves may be otherwise unrelated. One such example might be the presence of contact details pertaining to the associated domain registrar, such as (from the dataset used) filler[at]godaddy.com or support[at]goldenname.com. This point highlights the importance of reviewing individual findings for relevance and significance, before asserting the presence of a definitive link.

In certain cases where an e-mail username (i.e. just the part of the e-mail address to the left of the '@') is particularly distinctive, searches based on this characteristic alone might be sufficient to establish a link.

Finally, it is also worth noting that the identity of the e-mail address provider can yield its own insights in some cases, with addresses from webmail providers such as yahoo.com and outlook.com, or messaging services such as qq.com, found less frequently to be utilised by larger legitimate businesses.

Conclusion

This brief case study has highlighted the potential usefulness of e-mail addresses - features which are essentially unique to a particular entity, and which can be extracted directly from the content of a website through the use of a simple script or 'scraper' - as a means of establishing links between results. The identification of connections between findings can be a key part of the process of identifying serial infringers, or entities warranting prioritised analysis, and can serve as a 'start-points' for deeper open-source investigations into entities and their associated activities. 

Beyond this, insights drawn from the e-mail addresses themselves can also feed into more general algorithms used for quantifying the overall level of potential risk (e.g. non-authenticity) of a website. Characteristics such as the use of e-mail addresses from webmail providers and instant messaging services, for example, are less usually associated with mainstream corporate entities, and can be indicators of higher risk. 

References

[1] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 6: 'Result clustering'

[2] https://circleid.com/posts/braive-new-world-part-1-brand-protection-clustering-as-a-candidate-task-for-the-application-of-ai-capabilities

[3] https://www.iamstobbs.com/insights/using-clustering-and-investigation-techniques-to-connect-and-identify-scam-law-firm-websites

[4] https://circleid.com/posts/towards-a-generalised-threat-scoring-framework-for-prioritising-results-from-brand-monitoring-programmes

[5] https://www.iamstobbs.com/insights/exploring-a-domain-scoring-system-with-tricky-brands

This article was first published on 31 July 2025 at:

https://www.iamstobbs.com/insights/e-mail-address-extraction-from-webpages-a-quick-case-study-in-result-clustering


Friday, 25 July 2025

The commonest domain features: constructing look-up tables for use as part of a domain risk scoring system

Many previous pieces of research have focused on the desirability of a comprehensive scoring system, to be used for ranking results identified as part of a brand-protection solution, according to their potential level of threat. Such scoring systems offer the capability for identifying prioritised targets for further analysis, content tracking or enforcement actions[1, 2].

In a recent Stobbs study[3], we considered the case of a basic scoring system for domain-name results - a key category of findings because of the possibility for relatively comprehensive monitoring, the high online visibility of related infringements, the explicit nature of any associated IP abuse and the greater range of options for enforcement[4]. The algorithm presented in the initial study focused on characteristics of the domain name itself, taking account of factors such as the location and context within the domain name of the brand name of interest, the presence of relevance or non-relevance keywords, and the proportion of the domain name composed of other characters. This technique allows for an initial filtering of the list of candidate domain names of interest, and can be augmented by a second stage of filtering to take account of the content of any associated webpage, considering factors such as the number and prominence of mentions of the brand name, and the presence in the site content of relevance keywords.

What this initial algorithm does not encompass is any consideration of other technical or configuration factors associated with the domain, comprising any of a number of features which can also provide some indication of its likely potential level of risk. Examples of such characteristics considered in previous studies include the TLD (top-level domain, or domain extension) (working on the basis that some TLDs are more popular with infringers than others, due to factors such as cost, ease of registration, the presence of IP protection programmes, and the ease of enforcement)[5], and the host IP address (based on the assertion that websites hosted at (or near) IP addresses containing a large number of other 'bad' or blacklisted websites are themselves more likely to pose a risk)[6].

Overall, the set of relevant potential characteristics for assessing possible risk include the TLD, and the identity of associated domain service providers such as the registrar, hosting provider and nameserver host. These types of providers are typically associated with differing levels of 'trust', connected to factors such as compliance to enforcement requests and popularity with infringers[7]. As such, the use of a provider showing a greater degree of association with previously known 'bad' sites arguably provides an indication that any other arbitrary site associated with the same provider is - other factors being equal - more likely to be associated with greater degree of risk.

The basic methodology for constructing a threat-score algorithm on this basis thereby involves collating a large database of known bad sites (identified by - for example - comparison of website templates with those used by previously identified infringing sites, or by analysis and verification (as infringing) of results identified through a brand monitoring service), and extracting the features of interest for these known 'bad' sites. This process makes it possible to create 'league tables' of the top features and providers which tend more frequently to be associated with infringing sites.

One key point to note, however, is that merely the association of large numbers of infringing sites with a particular domain characteristic does not necessarily mean that that characteristic conveys higher risk. One particular reason why this may be the case is that certain characteristics are simply more common generally, and would therefore be associated with larger numbers of 'bad' sites even if the rate of association (i.e. the number as a proportion of the total) with such sites was not disproportionate. As an illustration of this point, we can consider the TLD; the .com domain extension, for example, will generally always be associated with large numbers of infringements, due simply to the large total number of domains registered on this extension. Accordingly, there will normally be a requirement to 'normalise' the raw numbers, by dividing the number of observed infringements by the total numbers of registered domains associated with the same instance of the particular feature (i.e. in the case of TLD, the total number of registered .com domains), to generate a measure of infringement frequency or 'hit rate' associated with the instance in question. Domain characteristics with greater infringement frequencies are generally more likely to be associated with higher risk.

In order to be able to carry out this type of analysis, it is necessary to compile 'look-up tables' of the (proportion of the) total numbers of registered domains which are associated with each possible option, for each feature of interest - i.e. ranked lists (by total (or relative) numbers) of the possible domain TLDs, registrars, hosting providers and nameserver hosts. The remainder of this article considers the process of compiling these lists and is illustrated by tables of the top entries (i.e. the most commonly-appearing options within the datasets) in each case. Whilst this has clear applications in threat scoring, it can also provide general insights in its own right, in terms of showing general trends within the domain name landscape.

Individual domain features

1. TLD

The total numbers of domains by TLD is a relatively simple statistic to obtain, as it can be trivially extracted from analysis of domain name zone files (at least for gTLDs (i.e. generic TLDs), for which the corresponding registries publish the data files and make them publicly accessible). A more comprehensive dataset (with significant additional ccTLD (i.e. country-code TLDs) coverage) is that provided by DomainTools[8], from which the top ten TLDs are shown in Table 1.

TLD
                                
No. domains
                                
% of dataset
                                
  .com 155,728,200 43.86%
  .de 17,378,724 4.89%
  .net 12,346,352 3.48%
  .cn 11,975,245 3.37%
  .org 11,226,231 3.16%
  .uk 9,752,126 2.75%
  .nl 5,973,733 1.68%
  .ru 5,795,959 1.63%
  .top 5,326,770 1.50%
  .br 4,989,115 1.41%

Table 1: The top ten TLDs by number of registered domains (N = 355,069,958) (DomainTools, 08-Jul-2025)

2. Registrars

For domain registrars, the ideal statistic would be the total numbers of domains under management by each registrar. One estimate of this total statistic is that provided by DomainNameStat[9], although some degree of 'post-processing' is required in order to obtain a 'clean' dataset, due to the existence of a range of variations by which some of the individual distinct registrars are referred to (e.g. with or without '.com', 'Inc.', 'Ltd', 'LLC', and the existence of other variations - e.g. there are over 1,200 distinct entries for DropCatch.com in DomainNameStat's list, mostly of the form 'DropCatch.com XXX LLC', where 'XXX' is a three- or four-digit string). The 'cleansed' list consists of over 1,100 distinct entities, of which the top ten are shown in Table 2.

Registrar
                                                                
No. domains
                                
% of dataset
                                
  GoDaddy.com 87,123,338 26.93%
  NameCheap 24,389,502 7.54%
  Tucows Domains 13,256,889 4.10%
  Squarespace Domains 12,352,131 3.82%
  Dynadot 9,019,705 2.79%
  NameSilo 7,393,306 2.29%
  GMO Internet Group, Inc. d/b/a Onamae.com 7,268,309 2.25%
  IONOS 6,749,280 2.09%
  Gname.com 6,659,851 2.06%
  HOSTINGER operations 6,055,416 1.87%

Table 2: The top ten registrars by total number of domains under management (N = 323,498,496) (DomainNameStat, 08-Jul-2025)

As a 'sanity-check', it is informative to compare these statistics with those identified through an explicit look-up process. In order to reduce the number of look-ups required, though still maintaining a representative sample of the overall domain universe, we consider a set of domains taken by extracting each 500th domain from each of the domain name data zone files. Broadly, domains are contained within the individual zone files in alphabetical order, so this equally-spaced sample should essentially provide a 'random' representative set of domains, which should not correlate obviously with any other characteristic. The only significant bias is that the zone-file analysis will exclude ccTLD domain results.

The sampling process described above generates a dataset of just under half a million domains (actually around 484,000), from the total set of around 350 million registered domains. Carrying out a whois look-up on each domain in the sample dataset (where information is available on an automated basis) makes it possible to extract the registrar identity in around 390,000 cases. Following a similar data 'cleansing' process to that described previously, the top ten registrars from this dataset are shown in Table 3.

Registrar
                                                                
No. domains
                                
% of dataset
                                
  GoDaddy.com 116,640 29.80%
  NameCheap 28,389 7.25%
  Squarespace Domains 17,511 4.47%
  Tucows Domains 17,007 4.34%
  Network Solutions 8,939 2.28%
  IONOS 8,802 2.25%
  Gname.com 8,431 2.15%
  Dynadot 8,155 2.08%
  GMO Internet 7,633 1.95%
  HOSTINGER operations 6,474 1.65%

Table 3: The top ten registrars by number of domains under management, based on a zone-file 'sampling' exercise (N = 391,416)

Overall, there is a good degree of similarity between these two lists (i.e. that provided by DomainNameStat and that provided by the sampled zone-file dataset), and the datasets do correlate with each other very well (correlation coefficient = 0.9923) (Figure 1).

Figure 1: Comparison of the numbers of domains under management for each registrar as given by DomainNameStat and the zone-file sampling exercise

In this case, the statistics from DomainNameStat probably constitute a better dataset for use in threat scoring analysis (not least because of the vastly increased number of data points), but the high degree of correlation with the zone-file sample does provide some confidence that the latter dataset constitutes a robust data-source for analysis in extracting alternative domain features, such as those discussed below, in cases where no definitive third-party data overviews are available.

3. Nameserver hosts

The nameserver host (defined as the domain name given as the end-section of the nameserver (NS) record for the domain in question - e.g. 'cloudflare.com' in the case of 'aaden.ns.cloudflare.com') can easily be extracted for any given domain via a simple whois look-up. The statistics given for this feature relate to the first (primary) nameserver record for each domain, based on the dataset obtained from the zone-file sampling exercise (Table 4).

Nameserver host
                                                                
No. domains
                                
% of dataset
                                
  domaincontrol.com 88,073 22.61%
  cloudflare.com 35,422 9.09%
  googledomains.com 16,290 4.18%
  registrar-servers.com 14,286 3.67%
  wixdns.net 11,344 2.91%
  afternic.com 10,278 2.64%
  dns-parking.com 9,082 2.33%
  hichina.com 7,250 1.86%
  share-dns.com 6,324 1.62%
  namebrightdns.com 6,251 1.60%

Table 4: The top ten nameserver hosts by number of domains, based on the zone-file sample dataset (N = 389,584)

4. Hosting providers

The hosting provider for a domain is defined as the operator of the webserver associated with the (primary) IP address at which the domain is hosted. In this case, the 'top' hosting providers could be calculated on a per-IP address or a per-domain basis; however, in this analysis, the latter approach is taken (since, in general, different IP addresses will be associated with differing numbers of hosted domains, so a per-domain approach provides a more representative overview), again using the sampled zone file dataset (Table 5).

Hosting provider
                                                                
No. domains
                                
% of dataset
                                
  Amazon 115,151 37.22%
  Cloudflare[10] 33,945 10.97%
  Squarespace 13,534 4.37%
  Namecheap 11,604 3.75%
  Google 8,894 2.87%
  Shopify 8,116 2.62%
  GoDaddy.com 4,918 1.59%
  Unified Layer 4,604 1.49%
  PSINet 4,398 1.42%
  Newfold Digital 4,298 1.39%

Table 5: The top ten hosting providers by number of domains under management, based on the zone-file sample dataset (N = 309,409)

Conclusion

Whilst the statistics presented in this article provide some insights regarding the sets of top domain service providers in their own right, the most obvious application is (using the full datasets in each case, rather than just the top-tens shown in this overview) as 'look-up' tables, for the purposes of normalisation of statistics of those features most commonly associated with infringing or otherwise 'bad' sites, as part of an overall threat-scoring approach. A fuller formulation of such an approach - which is key to identifying priority targets from (potentially very large) sets of brand-monitoring results - will also require a dataset of known 'bad' sites, which should itself be as large as possible so as to provide the most meaningful statistics. Ultimately, it is likely that other domain characteristics (such as registrant characteristics, SSL providers, etc.), in addition to other features such as the presence of MX records, web traffic, etc., will also feed into the construction of an overall comprehensive algorithm.

References

[1] https://circleid.com/posts/towards-a-generalised-threat-scoring-framework-for-prioritising-results-from-brand-monitoring-programmes

[2] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 5: 'Prioritisation criteria for specific types of content'

[3] https://www.iamstobbs.com/insights/exploring-a-domain-scoring-system-with-tricky-brands

[4] https://www.worldtrademarkreview.com/global-guide/anti-counterfeiting-and-online-brand-enforcement/2022/article/creating-cost-effective-domain-name-watching-programme

[5] https://circleid.com/posts/20230117-the-highest-threat-tlds-part-2

[6] https://www.iamstobbs.com/insights/notorious-ip-addresses-and-initial-steps-towards-the-formulation-of-an-overall-threat-score-for-websites

[7] https://circleid.com/posts/notorious-hosting-providers-an-overview-of-the-highest-threat-hosts-from-ip-address-blacklist-analysis

[8] https://research.domaintools.com/statistics/tld-counts/

[9] https://domainnamestat.com/statistics/registrar/others

[10] It is worth noting that Cloudflare offers 'pass-through' services, such that many websites simply utilising Cloudflare services will be associated with Cloudflare as the listed hosting provider. In such cases, the 'true' hosting provider can generally be determined only by contacting Cloudflare directly. 

This article was first published on 25 July 2025 at:

https://circleid.com/posts/the-commonest-domain-features-constructing-look-up-tables-for-use-as-part-of-a-domain-risk-scoring-system

Thursday, 24 July 2025

you/talk/fast/med: another forthcoming batch of new gTLDs

Following our previous discussion[1] of a new batch of domain-name extensions to be launched as part of the ongoing first phase of the new-gTLD programme, we present some follow-up comments relating to the next set of TLDs to be released; namely, .fast, .talk, .you, and .med. 

.fast, .talk and .you, all offered through Amazon Registry, are set to enter their sunrise periods on 26 August, before going into general availability most probably in October. The .med extension is to relaunch on an unrestricted basis with a pre-registration phase from 31 August, and open registrations from 2 September[2].

As with any new TLDs, the launches offer an opportunity for brand owners to review their domain registration policies, with a view potentially to registering relevant names for brand-related use, making defensive registrations or, at the very least, to proactively monitor the landscape and/or consider blocking mechanisms in order to defend against third-party abuse. 

Given the nature of the four new extensions, there is potential for a wide range of use-cases. Some of the TLDs in particular could be relevant to specific business areas (particularly for .talk, which may be appropriate for communications service providers or for content relating to marketing or reviews; and .med, which has potential in medical or pharmaceutical applications - an area of specific risk for counterfeit products - or businesses with connections to the Mediterranean). This set of TLDs generally also offers options for utilisation as part of a tagline, or to convey specific brand messaging. 

As in our previous study on new TLD launches, we consider the current landscape of similar domain names, as a proxy for the sorts of activity which may manifest themselves following the launches of the new extensions. Specifically, we consider the set of domain names ending with the respective terms.

As of the start of July 2025, zone-file analysis reveals over 252k domains with names ending with 'you', 153k with 'med', 61k with 'talk' and 58k with 'fast' (Figure 1). Table 1 shows the top five (pre-existing) TLDs represented in each of these four datasets.

Figure 1: Total numbers of (pre-existing) domains ending with 'you', 'med', 'talk' and 'fast' (split by (legacy) TLD)

Table 1: Numbers of domains associated with each of the top five (legacy) TLDs in each of the current sets of domains ending with 'you', 'med', 'talk' and 'fast'

Some trends are immediately apparent, such as the apparent popularity of those terms most directly relevant to the English language ('you', 'talk' and 'fast') with the UK market in particular (as evidenced by the extensive use of .co.uk), and the potential relevance of 'fast' to e-commerce (given the frequency of use of the .shop extension). 

In some cases, the data is complicated by the presence of 'false positives' (which obscure any insights relating to the likely future use of the more explicitly relevant new-gTLD extensions), particularly for 'med' (which frequently occurs as a sub-string of other terms such as 'armed', 'formed', 'groomed', 'teamed' and a wide range of others, plus names such as 'muhammed') and 'fast' ('breakfast'). 

Having removed any obvious false positives, some insights can be gained by looking at the types of terms which most frequently appear immediately prior to the terms in question, within the set of domain names. In this analysis, we consider the strings of different lengths preceding the keywords under consideration, to assess any obvious patterns of usage.

The following lists show the most common English language words[3] present in these datasets, for each of the terms considered:

Domains ending with 'you':

  • 9-character words preceding 'you':
    • 'beautiful' (256 instances ) (i.e. domains end with 'beautifulyou')
    • 'healthier' (151)
  • 8-character words preceding 'you':
    • 'welcomes' (180)
  • 7-character words preceding 'you':
    • 'healthy' (317)
    • 'without' (144)
  • 6-character words preceding 'you':
    • 'better' (532)
    • 'within' (447)
    • 'around' (331)
  • 5-character words preceding 'you':
    • 'loves' (1,427)
    • 'thank' (1,294)
    • 'about' (1,015)
    • 'moves' (443)
    • 'round' (348)
    • 'comes' (276)
    • 'found' (246)
    • 'bless' (157)
    • 'helps' (141)
    • 'power' (122)
    • 'teach' (116)
    • 'happy' (116)
  • 4-character words preceding 'you':
    • 'with' (3,615)
    • 'near' (2,521)
    • 'love' (2,111)
    • 'like' (984)
    • 'help' (765)
    • 'best' (377)
    • 'meet' (358)
    • 'find' (276)
    • 'miss' (237)
    • 'move' (220)
    • 'plus' (190)
    • 'real' (187)
    • 'from' (185)
    • 'need' (177)
    • 'told' (170)
    • 'know' (168)
    • 'true' (164)
    • 'dare' (163)
    • 'hear' (159)
    • 'want' (153)
  • 3-character words preceding 'you':
    • 'for' (40,991)
    • 'and' (4,496)
    • 'are' (1,257)

Domains ending with 'med':

  • 11-character words preceding 'med':
    • 'integrative' (123)
  • 10-character words preceding 'med':
    • 'functional' (171)
  • 9-character words preceding 'med':
    • 'concierge' (73)
    • 'lifestyle' (72)
    • 'precision' (60)
  • 8-character words preceding 'med':
    • 'internal' (238)
    • 'wellness' (42)
  • 7-character words preceding 'med':
    • 'natural' (101)
  • 6-character words preceding 'med':
    • 'sports' (1,220)
    • 'family' (675)
    • 'health' (273)
    • 'global' (129)
    • 'beauty' (110)
    • 'dental' (86)
    • 'mobile' (74)
    • 'travel' (63)
    • 'pharma' (62)
    • 'physio' (58)
    • 'techno' (57)
    • 'cardio' (47)
    • 'future' (41)
    • 'social' (39)
    • 'gastro' (39)
  • 5-character words preceding 'med':
    • 'sport' (177)
    • 'sleep' (97)
    • 'laser' (84)
    • 'smart' (80)
    • 'ortho' (79)

Domains ending with 'talk':

  • 10-character words preceding 'talk':
    • 'realestate' (76)
    • 'webhosting' (34)
  • 9-character words preceding 'talk':
    • 'marketing' (40)
  • 8-character words preceding 'talk':
    • 'straight' (272)
    • 'business' (118)
    • 'football' (64)
  • 7-character words preceding 'talk':
    • 'bitcoin' (40)
    • 'teacher' (39)
    • 'toolbox' (37)
    • 'fashion' (33)
    • 'english' (28)
  • 6-character words preceding 'talk':
    • 'sports' (428)
    • 'coffee' (171)
    • 'health' (166)
    • 'pillow' (143)
    • 'travel' (98)
    • 'street' (67)
    • 'beauty' (63)
    • 'people' (61)
    • 'social' (50)
    • 'family' (48)
  • 5-character words preceding 'talk':
    • 'small' (330)
    • 'table' (319)
    • 'could' (288)
    • 'money' (233)
    • 'trash' (114)
    • 'cross' (98)
    • 'power' (90)
  • 4-character words preceding 'talk':
    • 'tech' (842)
    • 'real' (517)
    • 'lets' (453)
    • 'talk' (270)
    • 'shop' (255)
    • 'body' (242)
    • 'news' (205)
    • 'girl' (180)
    • 'self' (150)

Domains ending with 'fast':

  • 9-character words preceding 'fast':
    • 'insurance' (59)
    • 'solutions' (24)
    • 'marketing' (23)
    • 'followers' (16)
    • 'customers' (15)
  • 8-character words preceding 'fast':
    • 'property' (98)
    • 'business' (67)
    • 'mortgage' (28)
    • 'websites' (20)
    • 'anything' (19)
    • 'approved' (18)
    • 'delivery' (17)
    • 'patients' (15)
  • 7-character words preceding 'fast':
    • 'funding' (54)
    • 'capital' (47)
    • 'clients' (36)
    • 'nowhere' (35)
    • 'blazing' (34)
    • 'service' (33)
    • 'english' (27)
    • 'finance' (27)
    • 'website' (26)
    • 'results' (26)
    • 'digital' (26)
    • 'connect' (21)
    • 'closure' (21)
    • 'forward' (20)
    • 'tickets' (19)
    • 'healthy' (19)
    • 'freedom' (19)
  • 6-character words preceding 'fast':
    • 'houses' (382)
    • 'weight' (136)
    • 'online' (94)
    • 'health' (31)
    • 'crypto' (29)
    • 'repair' (28)
    • 'quotes' (28)
    • 'better' (28)
    • 'travel' (23)
    • 'pounds' (22)
    • 'ticket' (21)
    • 'shirts' (21)
    • 'strong' (21)
    • 'design' (21)
  • 5-character words preceding 'fast':
    • 'house' (715)
    • 'super' (291)
    • 'homes' (238)
    • 'loans' (131)
    • 'parts' (92)
    • 'trade' (74)
    • 'learn' (63)
    • 'stand' (62)
    • 'ultra' (61)
    • 'think' (60)
    • 'smart' (58)
    • 'funds' (51)
  • 4-character words preceding 'fast':
    • 'home' (477)
    • 'cash' (273)
    • 'sold' (166)
    • 'food' (126)
    • 'grow' (98)
    • 'tech' (78)
    • 'shop' (76)
    • 'care' (73)
    • 'ship' (68)
    • 'very' (67)
    • 'read' (67)
    • 'help' (67)
    • 'easy' (67)
  • 3-character words preceding 'fast':
    • 'use' (727)
    • 'and' (426)
    • 'old' (321)
    • 'are' (146)
    • 'buy' (143)
    • 'car' (131)
    • 'pay' (118)

This type of analysis may be able to help inform strategic considerations by brand owners on possible-use cases for the new gTLDs when they launch, and specifically whether there are any good fits for product types, slogans and potential wider marketing initiatives. Similarly, however, the same characteristics of these new domain extensions can make them attractive to infringers, highlighting the importance of a proactive approach to monitoring and enforcement as the landscape continues to develop.

References

[1] https://www.iamstobbs.com/insights/free-hot-spot-an-exploration-of-three-new-gtld-launches

[2] https://iptwins.com/2025/06/19/new-gtlds-med-talk-you-fast-set-to-launch-in-august-september/

[3] Neglecting any expletives, or words which seem to provide no potential for a phrase making grammatical sense

This article was first published on 24 July 2025 at:

https://www.iamstobbs.com/insights/you-talk-fast-med-another-forthcoming-batch-of-new-gtlds

Revisiting the calculation of brand protection return-on-investment

by David Barnett, Richard Ferguson and Sheena Yonker August 2025 saw the publication of INTA's Anticounterfeiting Committee Policy Globa...