Friday, 21 November 2025

AI and web search (Part 2) - Optimising for web search in a world of AI

Notes from the B2B Marketing Live UK event, 19 - 20-Nov-2025, ExCel, London

Following on from our recent work[1] considering the general subject of the potential impacts of artificial intelligence (AI) on web search, web traffic and trademarks, these notes from the recent B2B Marketing Live event present additional detail on current opinions relating to Generative Engine Optimisation ('GEO'); that is, the general suite of techniques which can be employed by brand owners with the aim of trying to ensure that their brand and website content are referenced as strongly as possible in results from AI-powered search.

Key points

The search landscape is fundamentally changing

The appearance of AI overviews (AIO) in search results has recently seen huge growth; AIO are now used by 1.5B users per month in over 200 countries, with anywhere between 18% and 47%[2] (estimates vary) of searches generating AIO, and increases of around 116% in the prevalence of AIO since May 2025[3]. Furthermore, AIO typically cover 70% of the SERP (search-engine results page) screen space. Complex, decision-related queries (such as those associated with B2B business) may also be more likely to generate AIO.

Accompanying these trends are the facts that, although website impressions (i.e. appearances in organic search results) are typically found to be up by brand owners, there have been significant decreases in click-through rates (CTR) to websites (down by between 32% and 35% for the highest-ranked results)[4,5], with the CTR for the top-ranked (organic) search-engine result having been found to have decreased from around 28% to 20%. 

In addition, 'standalone' AI tools are also seeing large increases in usage, with many users increasingly not using 'classic' search tools at all, but instead being reliant on obtaining their answers 'in platform'. The most popular such tool, ChatGPT, now has 5.24B users per month, and 89% of B2B buyers use gen-AI as part of their purchase decision-making process. Search engines are also beginning to introduce embedded functionality such as Google's 'AI Mode', analogous to a gen-AI tool (though with the Google example currently only seeing 1-2% adoption).

Marketing success can no longer be judged (just) by click-through rates (CTR)

The evolutions in the search landscape are driving an increased acceptance by brand owners of 'zero-click' and reduced website traffic. Conversely, however, the extent of citation by AI tools is becoming an ever-more important metric, and is itself associated with factors such as brand visibility, authority, customer engagement, and trust.

Also of key importance is the fact that inbound brand referrals from generative AI ('gen-AI') tools or AIO tend to be more likely to convert to successful sales than click-throughs from classic search (due to an implicit trust by users of AI sources), with brands typically seeing sales growing more quickly than their observed increases in AI-driven traffic[6]. Overall, brands cited in AIO receive 35% more organic clicks and 91% more paid click than those not cited. Furthermore, customers searching asking complex (i.e. later decision-stage) business queries via LLMs are also likely to be associated with higher conversion rates, because of their inherent greater degree of 'pre-qualification', and overall it is noted that 61% of purchasers are influenced by AI responses. These observations are leading to the emergence of a dual objective for brands: (i) increasing AI traffic (i.e. GEO) and (ii) monetising more of this traffic (a relatively easier problem because of the greater inherent conversion rate from AI traffic).

Discoverability by humans and AI is key

It is becoming increasingly clear that brand and website content needs to be optimised for discoverability by both human searchers and AI systems (agents, LLMs, etc.). Whereas human users traditionally have preferences towards website clarity, usability and value proposition, AI systems tend to favour a different set of characteristics for those websites to be used more frequently as their trusted sources. These might typically include:

  • Data being presented in a structured format - including mark-up, schemas, APIs, etc., plus an emphasis on plain-text content (and reduced usage of features such as Javascript rendering), use of semantic HTML, XML site-maps, technical signposting for crawlers (e.g. allowing access in the robots.txt file), etc.
  • Indicators of trust - e.g. author mark-up, source references, authoritative in-links, etc. Trust and credibility for brands can be boosted through initiatives such as blogging, providing guest posts for external websites, and publishing content in other formats (such as on YouTube).
  • Sites incorporating extensive volumes of original information - including comprehensive FAQ sections, etc. There is also some suggestions that "best" or "top" lists ('listicles') are favoured by LLMs in some cases. Brands can also increase the likelihood of being referenced in AI responses by including 'actionable IP' (brand-specific) content on their websites, particularly where this addresses key questions asked by buyers in their decision-making process and provides sales qualification information. The production of specific, textual content to address specific requirements for landing pages from pay-per-click (PPC) ads may also be beneficial.

Increasingly, brands are utilising segmented websites, incorporating both human-focused areas with extensive brand-heavy, visual and interactive ('conversion optimised') elements, and (often human-invisible) AI-favourable content, which is text-heavy and detail-, explanation- and answer-oriented. Traditional marketing techniques, such as offering downloadable information sheets (especially if 'gated' with a requirement for users to submit contact details) are becoming less popular, with users instead tending increasingly to source answers from (ungated) gen-AI tools.

Understanding how AI systems process queries is crucial - and highlights a continuing importance of classic SEO

When an AI tool is presented with a prompt or query, it typically splits these into smaller sub-queries for individual processing ('query fan-out'). Frequently, however, these tools will utilise classic search engines to source responses to thesensub-queries (e.g. Gemini uses Google, ChatGPT uses Bing), although some make use of proprietary bots. It is also noteworthy that Google remains the market leader for organic search by a significant margin (with AI engines still seeing a market share of less than 1%).

As such, traditional search-engine optimisation (SEO) techniques are still of relevance, and a high volume of branded web mentions is still found to be the factor with the highest degree of correlation with the extent of AI references[7]. Overall, "good digital marketing" - i.e. the aim of achieving an extensive online range of structured, contextual brand references which address EEAT (expertise / experience / authoritative / trustworthiness) criteria - still sits at the intersection of SEO and GEO. 

Overall, however, having a brand presence across a wide range of channels and content types is key to being cited in AI responses. Key areas appear to be 'rich content' channels such as YouTube (particularly if titles, descriptions, etc. have been optimised for LLM readability) and user-generated-content or community channels, such as review sites. These types of insights can be drawn by analysing which sites are most frequently cited in AI responses, and this analysis consistently reveals that plaforms such as Reddit are favoured by LLMs. A diversification in platform focus for branded content is also particularly important in an era where many (particularly younger) users are becoming increasingly focused on 'social search' - i.e. utilising the native search functions in platforms such as TikTok, Pinterest, Reddit and Instagram in order to source answers to queries. Around one-quarter of users discover brands for the first time on social platforms, and many users utilise search across a range of platform types before making a final purchase decision. The aim for brand owners is therefore to optimise their own content across the same set(s) of platforms as are being used by their customer base, an initiative which is doubly beneficial given the increasing frequency with which classic engines such as Google are themselves also returning results from these types of platform. A key idea is the concept of the 'Day One' list, reflecting the fact that the most effective brand awareness for users is that generated by the results from the initial sets of searches carried out by them.

A key associated measurement factor when considering the likelihood of citation in AI responses in response to queries is 'share of model'. It may be informative for brands to track their presence and sentiment in LLMs by posing direct queries to the associated tools ('AI model reporting'), which can include specific questions relating to areas known to influence customer preference such as price, customer service, product features, ease of use, etc. A useful follow-up to this type of analysis can also be for brands to post their own mediative content online, to address any issues where identified.

However, branded content must be authentic and non-generic in order to build credibility (both with human users and with AI crawlers). Google also increasingly rewards material with 'personal' and 'expert' content. As such, online placement by brands of authored content can be beneficial, but should ideally also be accompanied by positive references from brand advocates, influencers and employees - noting that the set of employees of a company typically together have a following around twelve times the size of the company's official profile page.

AI responses themselves are becoming increasingly commercialisable

Some AI tools (including AIO / Google AI Mode) are starting to offer the option for paid-ad placement within them - in many cases, this is currently only available in the US, but is likely to expand in scope. Initial use-cases are likely to be focused towards e-commerce, as certain categories of products and services (such as pharmaceuticals and gambling) may be considered 'restricted verticals'. Increasingly, we are also likely to see more flexibility in targeting options for these ads (such as Google's 'Broad Match', 'AI Max' and 'Performance Max'), relating to the ways in which they are to be served up in response to particular query types (rather than being keyword-based). 

Care must, however, be taken by brand owners with AI ad placement, as brand references in AI responses are less amenable to control over the context in which the brand is mentioned, which could be problematic if (for example) there are regulatory requirements regarding the way in which the brand must be promoted, or concerns about being referenced alongside competitors.

Additionally, some AI providers may be reluctant to offer paid boosting of brands, due to implications regarding trust, especially that of paying customers. OpenAI's Sam Altman recently stated that "ads on a Google search are dependent on Google doing badly; if [Google] were giving you the best answer, there’d be no reason ever to buy an ad above it"[8].

AI can also be leveraged to itself optimise brand content

Marketing success in the world of GEO is dependent on having brand content structured and presented in appropriate ways, with one estimate claiming that 90% of branded content will be synthetic by 2026. A final point to note is that AI can itself assist with many of these areas, including:

  • Segmenting content by relevance to specific audience demographics - e.g. tailoring to local language (noting that 76% of B2B buyers prefer to purchase in their native language) or adapting websites or paid-ads to local audiences - though these types of initiative invariably also require an element of human QA. Increasingly, users expect content to be highly personalised, rather than being tailored to broader market segments.
  • Offering capabilities for AI agents to carry out highly personalised tasks (such as brand audits, or ROI or pricing calculators).

  • Optimising paid-media (pay-per click links / sponsored ads), through prediction of queries and keywordless targeting.

References

Event presentations

  • 'Five tactics for driving more leads in an AI-powered global search landscape', C. McKenna and S. Oakford, Oban International
  • 'AI traffic is money traffic', J. Kelleher, SpotDev
  • 'Navigating the era of AI search', B. Wood, Hallam (hallam[.]agency)
  • 'The impact of AI on B2B marketing and five expectations for 2026', G. Stolton, ROAST
  • 'The future of organic search and SEO', J. Powley, Blue Array SEO
  • 'Decoding the future: AI and the impact on B2B marketing', A. Moon, FutureEdge Academy

Other

[1] To be published as: 'AI's potential impact on web search, traffic and trademarks', Stobbs blog [link TBC]

[2] https://www.bcg.com/x/the-multiplier/the-future-of-discoverability

[3] https://ahrefs.com/blog/ai-overview-growth/

[4] https://ahrefs.com/blog/ai-overviews-reduce-clicks/

[5] https://ahrefs.com/blog/the-great-decoupling/

[6] N. Patel, NP Marketing (see e.g. https://www.instagram.com/p/DQz_jczEjlI/)

[7] https://www.semrush.com/blog/ai-mentions/

[8] https://www.techinasia.com/sam-altmans-lesson-google-trust-ads

This article was first published on 21 November 2025 at:

https://www.linkedin.com/pulse/ai-web-search-part-2-optimising-world-david-barnett-g7eme/

Thursday, 20 November 2025

XARF-way To (Enforcement) Paradise

The process of enforcement is a key element of many brand-protection programmes. Often it involves the submission of some sort of complaint, report, or notice to a platform or service provider requesting the removal or takedown of infringing content. However, this process can be extremely time-consuming, particularly in cases where there are large numbers of enforcements to be processed, or if the specifics of each case require significant amounts of customisation of the individual notices in question. 

As part of a drive towards greater efficiency, it has often been found to be beneficial to investigate approaches such as automation or bulk actions, often in conjunction with efforts to identify top prioritised targets. For 'standard' takedowns involving the submission of a notice to a registrar or hosting provider, it is often possible to make use of a fixed type of letter template, in which only case-specific details need to be varied between each particular instance. This type of enforcement process is highly amenable to automation - using some sort of scripting approach, to action what is essentially a ‘mail-merge’ process - which can greatly aid with efficiency.

Increasingly, however, many Internet service providers (ISPs) are refusing to accept letter-based complaints and are instead directing brand owners or their representatives towards webform-based abuse-reporting systems. Whilst superficially this may appear efficient, the completion of these webforms can actually be a more time-consuming endeavour. Many such reporting systems require complaints to be submitted one by one, or include components such as a need to complete CAPTCHA codes, which can frustrate enforcement efforts.

Conversely, however, one promising development which may aid the process of complaint submissions and the protection of IP rights is an increasing level of support for 'XARF' ('eXtended Abuse Reporting Format'), an alternative abuse-reporting framework comprising a standardised format for submitting abuse reports to ISPs[1]. It allows the relevant pieces of information to be packaged in a highly-defined document format known as JSON (JavaScript Object Notation), which can be communicated via e-mail or other communication routes (such as via an API, or Application Programming Interface). XARF documents are particularly amenable to automated production, such that this protocol also lends itself to more efficient enforcement workflows. Furthermore, certain ISPs explicitly reference XARF as a supported and/or preferred communication type (Figure 1).

Figure 1: Example of a snippet from a response to a standard takedown notice type, from an ISP referencing XARF as a supported communication type for submitting abuse reports

A number of code libraries exist for generating reports in XARF format and creating JSON documents, and these resources frequently include sample templates showing the fields required for abuse reports of a range of types (e.g. trademark infringement, copyright infringement, phishing, malware, child-abuse content, etc.)[2,3,4], which greatly aids with the generation of the required content. 

As such, it can be a relatively simple matter to generate the required content, using a scripted automation approach which allows the reports to be generated in bulk. These can be combined with bulk-e-mailer scripts, to allow the bulk submission of XARF-format takedown notices to ISPs, utilising just an input document containing the details of the targets and the case-specific pieces of information to be varied between the individual reports (Figure 2).

Figure 2: (Redacted) example of an e-mail-based enforcement notice in XARF format, generated using an automated script

This approach is already being tested using live client services, and is generating responses from ISPs which are essentially identical to those produced by 'classic', letter-based submissions. Whilst this alternative approach is not likely to be applicable in all cases - particularly where highly-bespoke, complex notices are required - it does offer the potential for increased automation and efficiency in the brand-protection process, in cases where repeatable styles of notice for standard infringement types would normally be required.

References

[1] https://abusix.com/xarf-abuse-reporting-standard/

[2] https://github.com/abusix/xarf/blob/master/samples/positive/3/trademark_sample.json

[3] https://github.com/abusix/xarf/blob/master/schemas/3/trademark.schema.json

[4] https://www.w3schools.com/python/python_json.asp

This article was first published on 20 November 2025 at:

https://www.iamstobbs.com/insights/xarf-way-to-enforcement-paradise

Thursday, 23 October 2025

Playing with a simple revisitor script for monitoring changes to website content

Introduction

A key part of the analysis workflow in brand monitoring services is often the maintenance of a 'watchlist' of sites. This requirement arises most frequently in services comprising domain monitoring, which detect newly-registered names containing a brand name of interest, but which may not yet feature significant or infringing content.

In these cases, enforcement action may not immediately be possible or appropriate, but there might be a concern that higher-threat content may appear in the future. There is often therefore a need to monitor the domains for changes to their content and provide an alert when a significant change is identified. At that point, a decision can then be made regarding appropriate follow-up action. Requirements for 'revisitor' functionality along these lines can also arise in other brand-protection contexts, such as when enforcement action has already been taken against an infringing target (such as a website or marketplace listing), and the targeted page is then tracked to verify compliance to the takedown action. 

There exist a number of automated tools which track content in this way, but key components of a highly effective version include the ability to analyse an appropriate set of different characteristics of the websites in question, and options to set the sensitivity appropriately - it is not generally desirable (for example) for an alert to be generated every time any change to website content is identified, since many websites incorporate dynamic features which differ every time the webpage is called. Conversely, sometimes a change which is only small, or of a particular type (e.g. the appearance of an explicit brand reference) can be significant.

In this article, I briefly explore the development and use of a Python-based revisitor script to inspect and then subsequently review a set of domain names of potential interest (using data from a domain monitoring service for a retail brand, as a case study). Having a simple, easily deployed script of this nature can be advantageous, in terms of being quick and efficient to roll out, and being fully customisable regarding the specific website characteristics analysed and the sensitivity thresholds to be used. These types of tools generally can be highly useful in the cases of watchlists which may feature many hundreds or thousands of URLs to be reviewed, and can, of course, also be expanded to cover other website features and more complex types of site analysis.

Script specifics

The workflow is built on the basis of a 'site visitor' script, which inspects each of the domains in the watchlist, and extracts the following features (which are 'dumped' to a readable database file):

  • HTTP status[1] - a numerical code corresponding to the type of response received when the domain name is queried; a code of '200' indicates a live website response (i.e. potentially an active webpage)
  • Page title[2] (as defined in the HTML source code of the page)
  • Full webpage content[3] (all text, plus formatting features and other content such as embedded scripts - i.e. the full HTML content)
  • Presence / absence of each of a set of pre-defined keywords[4] - applicable keywords for analysis might typically include brand terms or other relevance keywords (e.g. for a retail brand, terms indicating that e-commerce content is present ('buy', 'shop', 'cart', etc.))
  • Final URL[5] - e.g. the destination URL (e.g. after following any site re-direct)

The basic element of the functionality of the revisitor is then to inspect the same list of sites at subsequent times, as required (or on a regular basis if configured to run accordingly), extract a list of the same features, and then compare these with the corresponding features from the same site from the previous round of analysis (as read from the database file). In an initial simple implementation of the script, the following are deemed to be significant changes (i.e. denoting that the site is now worthy of further (manual) inspection and consideration for follow-up action):

  • A change to an HTTP status of 200 (i.e. the appearance of a live website response)[6]
  • Any change to the page title
  • Any(*) change to the webpage content
  • Any instance of the appearance of a keyword of interest (where not previously present)
  • Any change to the final URL (e.g. the appearance or disappearance of a re-direct)

Of course, none of these changes guarantee that the website is now definitively of concern or infringing, but it does generate a 'shortlist' of sites generally then requiring manual review for a definitive determination of appropriate next steps (much more efficiently than having to review the whole set of sites in the watchlist manually on a regular basis). 

Considering content-change thresholds

As discussed above, one of the trickiest features is the determination of an appropriate 'threshold' for alerting to changes to webpage content. The simplest configuration is simply to trigger a notification for any change(*), but in some cases this option may turn out to be too 'sensitive' and might generate too many candidate sites for convenient further manual review (depending on the size of the watchlist and the interval between successive inspections).

As a further exploration, it is instructive to investigate a numerical basis for quantifying degrees of webpage change, and what these differing degrees 'look like' in practice. There are a number of potential algorithms for quantifying the degree of difference between two passages of text (as discussed, for example, in previous work on mark comparison[7]); however, the simple script discussed in this article employs the Python library module difflib.SequenceMatcher[8] applied to the full HTML of the page (split across spaces into individual 'words') to calculate a difference score. This simple score is based on the ratio of the number of 'similar matches' (i.e. words in common) between the two versions of the page in question, to the total number of elements (words). Furthermore, the script has been configured to also provide a more granular view of the exact nature of the change, comprising a summary of which elements (i.e. words in the HTML) have been removed from the (HTML of the) page between the two successive inspections, and which have been added (Figure 1).

(a)

(b)

Figure 1: Examples / illustrations of identified content changes for specific individual webpages between successive inspections: 

  • a) a change to a single dynamically generated string (in this case, Javascript elements) 
  • b) a change from showing an error message to featuring distinct (Javascript) content

Discussion and Conclusions

The examples in Figure 1 provide some initial illustration that the nature of the identified changes are potentially much more important in any determination of significance than (for example) a numerical quantification of the extent of the change (as a proportion of the website as a whole). The first example (i.e. 'a') - a change to a dynamically generated string - is potentially something which might be seen on every occasion the site is inspected and might not correspond to any material change to the page (the visible site content may be entirely unaffected, for example). Conversely, the second example ('b'), representing a change from a simple error message (which, in this case, comprised essentially the content of the website in its entirety) to the appearance of some sort of live, script-generated content (potentially wholly different website content), might be much more significant. 

However, these differences may not be apparent from an inspection of just the numerical 'size' of the change on the page (i.e. the 'difference score'); a variation in a piece of scripted content (such as in Figure 1a) might, for example, just pertain to a small element on a much larger page, or could constitute the dominant component of the webpage as a whole. For example, in a sample dataset, examples of single changes similar to that shown in Figure 1a were found to be equivalent - across the examples of websites in the dataset - to anywhere between less than 5%, or more than 50%, of the whole content of the website in question.

For these reasons, there is always some danger in specifying a specific threshold below which degrees of change to the page are disregarded. In some senses, it is safer to conduct a more detailed inspection of all pages which show any change in content between successive revisits, so as to avoid missing significant cases. However, depending on the numbers of sites under review, this may not be feasible. Accordingly, in future developments or more sophisticated versions of the script, it may be appropriate to refine the scoring algorithm to reflect the nature and/or content of any change. 

However, regardless of the specifics, the general approach discussed in this article is generally able to build efficiency into the review process of sites of future possible concern, potentially filtering down large numbers of sites to be reviewed into much smaller 'shortlists' of candidates identified for deeper inspection and analysis on any given occasion.

References

[1] Using Python library module: urllib.request.urlopen([URL]).status

[2] Using Python library module: bs4BeautifulSoup([URL],’html.parser’).title.text

[3] Using Python library module: urllib.request.urlopen([URL]).read()

[4] Using Regex matching (Python library module: re.search) as applied to the full webpage (HTML) content

[5] Using Python library module: urllib.request.urlopen([URL]).url

[6] However, care must also be taken to distinguish a 'real' change in site status from an 'apparent' change which can arise in instances where (for example) the connection speed to the site is slow, and a connectivity time-out may be mistaken from a real case of site inactivity.

[7] https://www.linkedin.com/posts/dnbarnett2001_measuring-the-similarity-of-marks-activity-7331669662260224000-rh-R/

[8] https://www.geeksforgeeks.org/python/compare-sequences-in-python-using-dfflib-module/

This article was first published on 23 October 2025 at:

https://www.iamstobbs.com/insights/playing-with-a-simple-revisitor-script-for-monitoring-changes-to-website-content




Friday, 10 October 2025

How the growth of AI may drive a fundamental step-change in the domain name landscape

by David Barnett and Lars Jensen (ShortDot)

Introduction

The rate of adoption of artificial intelligence (AI) systems over the last few years, particularly in online and technology-related contexts, has been striking. Automated web-based queries now account for over half of all traffic (51% as of 2024)[1], and nearly three-quarters (74%) of webpages now include some AI-generated content[2]. Overall, traffic generated by AI technologies saw a growth of over 500% in the five months to May 2025[3], and a 2025 study of 3,000 websites found that 63% of them already receive traffic from AI-generated referrals[4]. Looking forward it is predicted that, by 2028, AI-powered search and recommendation engines will drive more web traffic than traditional search[5].

Looking more generally at the landscape, it is estimated by Gartner and other sources that, by 2026 or 2028, 20% of online transactions will be carried out by AI agents[6,7,8,9]. Furthermore, by the end of 2026, 40% of enterprise applications may be integrated with task-specific AI agents, potentially generating 30% of enterprise application software revenue by 2035[10]. Additionally, by 2030, there may be in the region of 500 billion to 1 trillion connected devices, comprising the wider ecosystem of the 'Internet of Things' (IoT)[11,12,13] and (in the absence of mediating factors[14]) this will almost invariably result in an enormous growth in the proportion of DNS traffic categorised as 'machine-to-machine' communication.

It is likely that a significant proportion of these connected entities will require unique DNS identifiers, and many industry commentators are increasingly of the opinion that there will be a desire for a many - particularly agentic AI systems - to be associated with unique domain names[15]. These names could serve as a 'birth certificate' or 'trusted identity' for the systems in question, helping to establish user confidence and familiarity. Any evolution along these lines would have an enormous impact on the overall size of the domain landscape (currently around 350 million names), and it may not be unreasonable to suggest that, by 2050, there may be of the order of 10 to 50 billion registered domains. This propounded evolution of the landscape echoes previous studies suggesting that, in the future, the growth of agentic AI will demand a new layer of verifiable identify infrastructure[16] and that it may be desirable for each distinct AI agent to be tied to an 'immutable root' (i.e. identifier)[17]. This trend would be in some ways analogous to the transition from the IPv4 to the IPv6 system for allocating IP addresses, which created a step-change in capacity from 232 (around 4 billion) to 2128 (around 3 × 1038) possible combinations.

Of course, the shape of the AI-related domain name landscape is already changing. Numbers of .ai domains (for example) have massively spiked since the launch of ChatGPT (notably also driving a fundamental boost to the revenues of parent country Anguilla)[18]. Across the full domain name landscape more generally, there are many tens of thousands of examples featuring keywords pertaining to popular and emerging technologies ('ai', 'crypto', etc.), and this demand is only likely to grow. Such trends may emerge in parallel with the forthcoming second phase of the new-gTLD (generic top-level domain) programme, which might see a push towards the availability of much larger numbers of new brand-, industry- or technology-specific domain-name extensions. Other possible evolutions in business behaviour - such as a possible move towards technology entrepreneurs taking advantage of greater opportunities for AI use and automation, so as to establish and run much larger numbers of businesses - may also drive increased demand in the domain-name landscape.

These comments must also be considered against the backdrop of the fact that the current domain landscape is already - in some regards - beginning to run low on capacity. Whilst the total proportion of all possible domain names which are actually registered is still extremely tiny, there is a relative shortage of short, memorable domain names (particularly those comprising dictionary terms) across popular domain name extensions (TLDs). For example, there are currently essentially no .com domains of 4 characters or fewer available for registration, and very few (short) dictionary terms[19]. These observations are already generating a push towards the use of alternative domain name styles and emerging TLDs, in addition to distinct channels altogether (such as blockchain domains and the Web3 environment)[20].

In terms of the overall landscape of web addresses associated with (agentic) AI systems specifically, what might these trends look like? Two possible directions for development include: (a) the emergence and growth of dedicated domain names for specific AI agents (potentially of the form (for example) [role]AI.[TLD]), with the name signifying the function of the system in question; or (b) the increasing use of AI-specific subdomains (say, AI.[site].[TLD]) within the trusted webspaces (i.e. hosted on the primary domain names) of popular companies, to host agentic systems or other AI functionality. Companies are likely predominantly to continue to use popular legacy TLDs such as .com for the foreseeable future but - as part of these evolving trends - may start to branch out into other existing TLDs, or new extensions emerging from phase two of the new-gTLD programme. Exactly which extensions do succeed will ultimately depend on issues around usability and trust (rather than necessarily just comprising an AI-specific label).

Case studies - the current landscape

As illustrations of the current state of the landscape pertaining to the two specific possibilities discussed above, we consider two datasets, as outlined below.

1. Agentic-AI-style domain names

For this analysis, we consider a list of 100 keywords relating to professions or industry areas (with a specific focus, where possible, on examples where AI applications may be relevant). For each of these, we consider whether a domain name consisting of the keyword, either prefixed or suffixed by the string 'ai', is registered, across each of the top-50 largest existing gTLDs (by size of the domain name zone file, i.e. the data file containing the names and configuration information of all registered domains). Therefore, for 'accountant' (for example), on .com, the analysis looks to determine whether accountantai[.]com or aiaccountant[.]com are registered as domain names. This methodology thereby yields 200 possible (or 'candidate') domain names for consideration, across each of the 50 TLDs, or 10,000 candidate domain names in total.

The analysis shows that, of the 10,000 possible domain names of this format, 2,053 (20.5%, or just over one in five) are already registered. A more granular analysis is shown in Figure 1, showing a 'registration map' of which names are already registered (shown in red), versus those which are absent from the zone file (and therefore potentially unregistered and available) (in green).

Figure 1: 'Registration map' for 'agentic-AI-style' domain names (red = registered, absent from zone file = green), where the second-level name (SLD) (i.e. the part of the domain name to the left of the dot) is shown on the vertical axis and the TLD (domain name extension) is shown on the horizontal axis. The dataset is sorted by (vertically, decreasing from top to bottom) the number of TLDs (out of 50) across which the SLD exists as a registered domain, and (horizontally, decreasing from left to right) the total number of SLDs (out of 100) which exist as a registered domain across the TLD in question. Results are shown for the top 50 most commonly registered SLDs.

The top five most commonly registered SLD strings in the dataset are aiagent (with 'agent' likely referring to its technical, AI-related definition in most cases), agentai, aiart, aimusic, and aimarketing, existing as registered domains across 47, 41, 38, 38, and 36 (respectively) of the 50 TLDs considered in the analysis. Only three of the 200 strings do not appear as the SLDs of registered domain names across any of the 50 TLDs.

The top TLD in the dataset is .com (for which 197 of the 200 considered strings exist as the SLDs of registered domains), followed by .net (144), .org (139), .xyz (134), and .app (107). Only one TLD of the 50 (.ovh) does not feature any of the considered SLD strings as registered domains.

Some examples of some of the registered .com domains which also resolve to live website content are shown in Figure 2. Many of the remainder resolve to lower-threat content such as placeholder and parking pages, suggesting perhaps that they have been proactively registered for future intended use, or may be being held as tradable commodities in their own right, given the potential use-cases for these types of name. aiagent[.]com (for example) resolves to a page offering the domain name for sale and requesting offers in excess of $1.5 million, and aibanking[.]com, aibarrister[.]com, aicontroller[.]com, aidesigner[.]com, and aiinvestment[.]com are all explicitly soliciting offers in excess of $100k.

Figure 2: Examples of 'agentic-AI-style' .com domain names resolving to live website content: aiaccountant[.]com, aianalyst[.]com, aidoctor[.]com, aiparalegal[.]com, aiphotographer[.]com, aireceptionist[.]com

2. AI-specific subdomains

The second piece of analysis considers the extent of the existence of AI-related subdomains (taking the specific example of URLs of the form AI.[site].[TLD]), on each of a series of the most popular (i.e. highest traffic) websites across the Internet. In particular, we consider the 47 most highly visited websites generally, derived from data from Similarweb and Semrush[21] (truncated from a top-50 list, but considering only examples comprising full, second-level domain names), and a dataset of the top 20 information technology (IT) company websites (according to Semrush[22]) - i.e. one example of an industry vertical where AI may be particularly relevant (noting that two domains, live.com and office.com, appear in both lists).

The analysis shows that a specific hostname of the form AI.[site].[TLD] was found to resolve (i.e. is configured with an active DNS entry) for 20 of the top 47 websites globally (i.e. 43%, with 19 of these explicitly also generating a live HTTP (i.e. website) response) (Figure 3), and for 8 of the top 20 IT websites (40%, with 6 also showing a live HTTP response). This does not, of course, preclude the existence of other AI-specific areas of the websites which may use alternative naming conventions, such that these figures represent very much a lower limit on the proportion of these sites already featuring dedicated AI-related sections.

Figure 3: Examples of AI-specific subdomains (of the form AI.[site].[TLD]) on domains within the top-50 list of most popular websites: ai.google.com (re-directs to ai.google), ai.facebook.com (re-directs to ai.meta.com), ai.baidu.com, ai.microsoft.com (re-directs to microsoft.com/en-us/ai)

Discussion and conclusions

Many of the points discussed in this article are reminiscent of terminology used in the futurology study 'From Malthus to Mars'[23]; the work describes certain emerging capabilities as '10x technologies', referring to their capacity to be ten times more effective than their predecessors, and expand accessibility to a far wider audience. Furthermore, some of the predictions referenced in this article are even more significant, and potentially have the ability to push 'from 10x to 100x' growth, representing a fundamental step-change in capabilities and with the power to drive fundamental evolutions of the online landscape.

As AI continues to evolve in an ever-more-interconnected online ecosystem, it is likely that domain names will remain a foundational component of the overall landscape, comprising a permanent, trusted layer which is able to give every connected entity a unique identifier.

Some of these trends are already being observed, even across the existing legacy infrastructure, with significant growth in the numbers of registered domains with specific relevant name structures and/or containing relevant keywords. It will be interesting to see how near-future developments, such as the forthcoming second phase of the new-gTLD programme, the inevitable continued growth and evolution of AI technologies, the increasing interconnectedness of online channels, and the ongoing emergence of new AI use-cases and other areas of online technology, will contribute to this overall picture.

References

[1] https://www.imperva.com/blog/2025-imperva-bad-bot-report-how-ai-is-supercharging-the-bot-threat/

[2] https://ahrefs.com/blog/what-percentage-of-new-content-is-ai-generated/

[3] https://searchengineland.com/ai-traffic-up-seo-rewritten-459954

[4] https://ahrefs.com/blog/ai-traffic-study/

[5] https://www.semrush.com/blog/ai-search-seo-traffic-study/

[6] https://www.linkedin.com/pulse/2026-one-five-retail-transactions-completed-ai-agent-question-amit-6wl1e/

[7] https://onereach.ai/blog/agentic-ai-adoption-rates-roi-market-trends/

[8] https://www.gartner.com/en/documents/6894066

[9] https://www.pymnts.com/artificial-intelligence-2/2024/ai-to-power-personalized-shopping-experiences-in-2025/

[10] https://www.gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of-enterprise-apps-will-feature-task-specific-ai-agents-by-2026-up-from-less-than-5-percent-in-2025

[11] https://www.cisco.com/c/dam/global/fr_fr/solutions/data-center-virtualization/big-data/solution-cisco-sas-edge-to-entreprise-iot.pdf

[12] https://pmc.ncbi.nlm.nih.gov/articles/PMC11085491/

[13] N. Quadar, A. Chehri, G. Jeon, M.M. Hassan, G. Fortino (2022). Cybersecurity Issues of IoT in Ambient Intelligence (AmI) Environment. IEEE Internet Things Mag., 5, pp. 140-145. doi: 10.1109 / IOTM.001.2200009.

[14] https://pdfs.semanticscholar.org/f6fb/3f56f29f23cb8724fce2a7667f08e1641eb4.pdf

[15] For example, from Domain Summit Europe 2025:

[16] https://www.kuppingercole.com/watch/future-of-identity

[17] 'A Novel Zero-Trust Identity Framework for Agentic AI: Decentralized Authentication and Fine-Grained Access Control'; https://arxiv.org/html/2505.19301v2

[18] https://www.imf.org/en/News/Articles/2024/05/15/cf-an-ai-powered-boost-to-anguillas-revenues

[19] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 9: 'Domain landscape analysis'

[20] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 13: 'Analyzing trends in Web3'

[21] https://en.wikipedia.org/wiki/List_of_most-visited_websites

[22] https://www.semrush.com/website/top/global/information-technology/

[23] https://frommalthustomars.com/

This article was first published on 9 October 2025 at:

https://circleid.com/posts/how-the-growth-of-ai-may-drive-a-fundamental-step-change-in-the-domain-name-landscape

Thursday, 25 September 2025

Revisiting the calculation of brand protection return-on-investment

by David Barnett, Richard Ferguson and Sheena Yonker

August 2025 saw the publication of INTA's Anticounterfeiting Committee Policy Global Project Team report, 'Anticounterfeiting and Return on Investment'[1,2], considering the issue of the calculation of return on investment (ROI) for brand-protection initiatives (with a specific focus on counterfeiting activity).

Whilst much of the content echoes previous research, the new report does describe some of the relevant factors in slightly different ways, thereby providing some alternative frameworks which can be used to consider relevant ideas. The publication of the report therefore offers a suitable opportunity to revisit and expand on some of the key points relating to the general subject area. 

Pre-existing research and concepts 

In general, the ability to quantify ROI for brand protection initiatives is a long-standing requirement in many organisations, frequently associated with a desire to demonstrate success and secure funding for future similar projects.  

The key ideas necessary for constructing calculation frameworks have been outlined in a range of prior pieces of research, with some of the most significant points[3,4] outlined below. 

  • For e-commerce marketplace enforcement specifically, ROI is most traditionally assessed by considering the numbers of infringing items removed through takedown actions, applying a customer substitution rate (i.e. the assumed proportion of customers who will buy a legitimate item once the infringing version is made unavailable - and potentially also applying an additional conversion factor to account for the proportion of available items which translate to a sale), and then using the legitimate item price to estimate the increase in profit generated through additional legitimate sales as a result of the enforcement. 
  • Modifications or enhancements to the above simple approach can be made by considering:
    • Variability in substitution rates; specifically, the effects of e.g. item price (either absolute, or differential between infringing or legitimate goods), or the degree of deception involved in the sale of the infringing items. 
    • Use of 'data caps' to prevent unrealistically high calculated ROI values. 
    • The long-term impact of the enforcement programme, rather than just basing the calculation on the short-term numbers of ongoing enforcements carried out on a regular basis - i.e. considering the scale of the infringement landscape as compared with that observed at the start of the service, before active takedowns were being carried out. Related to this idea is an assessment of how 'clean' the search results are (for infringing vs. legitimate items), or the degree to which the brand owner is able to achieve 'ownership of the buy button' - i.e. be the top-listed seller in response to searches for relevant product terms. 
    • The impact of the brand protection initiative on brand value - essentially, considering intangible (though still quantifiable) impacts associated with factors such as consumer brand awareness, loyalty / churn, and reputation. Also relevant in this type of determination are areas such as reductions in the cost of capital for the brand owner (i.e. perceived brand risk), and in operating costs required to address ongoing issues. 

The importance of considering additional factors, such as 'real-world' impacts (e.g. increases in numbers of visitors to physical stores, or information from 'on-the-ground' actions such as raids and seizures), direct monetary gains (such as proceeds from fines or damages arising from successful legal actions), and volumes of infringements avoided in the future as a result of a proactive brand protection programme, has also been noted.  

However, it is also clear that there is no 'one-size-fits-all' approach to ROI calculation, and any framework will have a number of associated caveats, such that the algorithms are really most appropriate only for like-for-like comparisons. 

Additional ideas from the new report 

INTA's new report stresses the idea of considering brand protection as a strategic business imperative, and also reiterates the importance of taking a conservative approach (i.e. to prevent unrealistic estimates which can affect the credibility of ROI assessments).  

The report also emphasises the importance of integrating online and offline components, particularly in investigative strategies (e.g. to identify relevant physical locations and supply chains). This idea is also familiar from previous research looking at activity 'hotspots'[5,6,7,8], which can allow efforts to be focused in key locations, in a cost-effective way. Also previously noted is the importance of data analysis in establishing the existence of clusters of infringements associated with individual high-volume key infringers, to better inform where enforcement should be targeted[9,10]. Additionally, in brand protection programmes, offline measures can include raids and seizures, and a range of legal proceedings.  

It is also noted that careful monitoring of relevant online channels can provide early warning of new threat types, and insights into market trends. Furthermore, impacts resulting from brand protection efforts can potentially also similarly be assessed, by monitoring for factors such as changes in volumes of customer complaints.  

Furthermore, the implementation of initiatives such as product tracking or verification measures can, as well as contributing to securing the supply chain and reducing the ease of counterfeiting in their own right, result in a positive effect on perception for brands seen to be protecting their customer base and leading in innovation. Brand protection initiatives can also fulfil compliance requirements and mitigate legal risk.  

Also explicitly noted is the distinction between 'soft' and 'hard' ROI - essentially, the difference between the assessment of potential benefits and 'direct' monetary gains (e.g. through fines or damages) outlined above. One key point made in the report is that the assessment of 'soft' ROI is made particularly difficult by the fact that the nature and scale of significant proportions of infringing activity is frequently unclear (i.e. 'illicit trade obscurity').  

It is stressed that the aim of brand protection is ultimately to facilitate legitimate product sales, in an ecosystem where counterfeiting is currently associated with an annual economic cost of almost half-a-trillion dollars[12]. The report outlines some possible ROI calculation frameworks, including a 'substitution rate' approach for marketplace enforcement, similar to that outlined above, in addition to other simple calculation methods for assessing ROI from physical seizures, and from settlements achieved through legal actions. 

A final key point to note is the importance of adopting a cross-functional collaborative approach to brand protection within organisations, together with regular processes of review and strategy adaptation. 

Further thoughts 

Overall, we echo the assertion that an idealised framework for assessing ROI for brand protection should capture issues relating to consumer sentiment, as well as just (hard and soft) financial recovery - and, even just considering explicit monetary impacts, it is worth reiterating the point that any simple mathematical formulation will generally only provide part of the picture. Beyond this area, an ability to track complaints, reviews and social sentiment - both before and after enforcement - can provide more comprehensive insights, though this requires adequate technical capabilities from the brand protection service provider.  

The early phases of product development and launch are often a key pressure point for brand owners. Counterfeits often appear quickly, and therefore the capability to measure the rate of appearance of infringing goods, in addition to quantifying takedown speeds and tracking reductions in customer complaint volumes within these initial stages, can help to make ROI more tangible. Similarly, intensified bursts of online and offline enforcement during key marketing campaigns may also be appropriate. As noted in the INTA report, internal collaboration within an organisation is key, and should include input from (at least) sales, marketing and social management functions. 

Within companies, boards are also likely to expect to see demonstration of online vs offline cost efficiency. The construction of a simple metric (along the lines of 'cost-per-pound-of-protected-value') could help explain spend allocation. This determination may not be straightforward, however; for example, it is often the case that civil litigation often retrieves only limited financial proceeds, and can (in isolation) appear to be associated with a negative ROI, though additional factors (such as deterrent effects to future infringements) should also be considered.  

Crossover into the offline space is generally an important part of any brand protection initiative, but often comes with its own difficulties. For example, it may be difficult to achieve sufficient collaborative input from law enforcement agencies, unless case evidence is well established and clear cut. Offline efforts may also be made more difficult in cases where relevant IP rights have not been established. However, raising public awareness of infringement 'hot spots' (in addition to providing guidance regarding the dangers of counterfeits and how to spot them) can also be advantageous. 

In general terms, both online and offline measures need to be properly accounted for, in a joined-up fashion, in any comprehensive assessment of ROI. One key element is the use of effective case-management systems which are able to incorporate insights from the full extent of the brand owner's supply chain, using data on levels of infringements throughout (including customs seizures, levels of online saturation, and so on). Going forward, it would be extremely instructive to see a range of anonymised case studies, ideally involving data sharing across key industry bodies (INTA, IACC, ACG, etc.), so that appropriate ROI benchmarks could be created. 

As a final point, it is also important to note that the discussion in this article has focused primarily on counterfeiting specifically. This is of course only one of the areas a brand protection programme should address, and which include issues as diverse as phishing, malware distribution, and executive impersonations. Even just considering the sale of physical goods, other areas of concern - such as the trade in 'grey goods' (i.e. official items, but distributed through unapproved channels) - must generally also be addressed. 

References

[1] https://www.inta.org/news-and-press/inta-news/new-inta-report-offers-guidance-for-measuring-roi-in-anticounterfeiting/

[2] https://www.inta.org/wp-content/uploads/public-files/advocacy/committee-reports/2025-ACC-COMMITTEE-REPORT-081825.pdf

[3] https://www.iamstobbs.com/opinion/brand-protection-return-on-investment-an-overview-of-calculation-frameworks-and-methodologies

[4] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 11: 'Quantifying brand protection return-on-investment'

[5] https://www.iamstobbs.com/opinion/tracking-the-uk-trade-in-fakes-counterfeit-hotspots

[6] https://www.iamstobbs.com/opinion/tracking-the-uk-trade-in-fakes-ins-and-outs

[7] https://www.iamstobbs.com/opinion/think-globally-act-locally-an-overview-of-infringement-hotspots-around-the-world

[8] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 15: 'Links to offline data'

[9] https://circleid.com/posts/braive-new-world-part-1-brand-protection-clustering-as-a-candidate-task-for-the-application-of-ai-capabilities

[10] https://www.iamstobbs.com/insights/e-mail-address-extraction-from-webpages-a-quick-case-study-in-result-clustering

[11] https://www.iamstobbs.com/insights/further-explorations-in-clustering-use-of-google-advertising-tracking-links

[12] https://www.oecd.org/en/publications/mapping-global-trade-in-fakes-2025_94d3b29f-en/full-report.html

This article was first published on 25 September 2025 at:

https://www.iamstobbs.com/insights/revisiting-the-calculation-of-brand-protection-return-on-investment

Thursday, 28 August 2025

WyrdBrandz: Misspelling in brand names - is it an effective marketing tactic?

Introduction

In the modern world of branding, consumers are becoming increasingly familiar with the use of 'wacky' misspellings (sometimes described as 'sensational spellings'), with familiar examples including Flickr, Reddit, Tumblr, and many more. 

There are many reasons why this approach to selecting a brand name can be an effective one, including the facts that these types of misspellings can attract customer attention and aid memorability, and can make it easier obtain a protective trademark and secure an available domain name (as compared to the use of a dictionary word or proper name, for example). In conflict with these considerations, however, are the facts that misspellings generally are understood[1] to generate adverse brand attitudes (including questions regarding brand sincerity) and are frequently perceived as a marketing 'gimmick'. 

A deep analysis of the wider world of consumer attitude to 'misspelled' brands has recently been published by researchers from the Universities of Arkansas and Tennessee[2,3]. The study finds that the negative impact associated with the use of a minor misspelling can be offset through the careful choice of a particular spelling (or other branding features) to aid with brand-name interpretation by consumers. These types of insight are key to success in the areas of brand selection and marketing and, by extension, to considerations such as domain name registration (an area which was covered in previous work, providing overviews of techniques applicable to the discovery of available unregistered 'brandable' domain names[4,5,6]).

It is also important to note that the types of misspelled names considered here are completely distinct from other cases involving the deliberate use of deceptive (brand- or domain) names similar to those of third-party established trusted brands, for the purposes of impersonation and fraud[7,8].

Definitions and a deeper dive

The study of consumer reactions to misspellings in brand names is based fundamentally on the concepts of linguistic fluency (i.e. the ease of processing written content into language) and conceptual fluency (the ease with which associated meaning is brought to mind). Central ideas include the propositions that, overall, consumers process names more effectively when they are more similar to familiar terms (such as dictionary words), and that greater processing fluency tends (in general) to lead to improved brand perception. Lower degrees of 'orthographic' similarity to familiar strings can also be counteracted (to a degree) through the use of phonetic cues (i.e. those which are apparent when the brand name is sounded out) to the intended meaning. 

A core idea within this overall framework is that the use of only minor misspellings can have a small or negligible negative impact on brand perception, which can then be more than compensated through the clever application of other marketing-related mediating factors. This approach thereby allows the brand owner to take advantage of the other attractive features of misspellings, such as greater trademark and domain name availability.

As part of the analysis presented in the recent study, the types of misspellings used by brands were categorised into a set of 'types' - strikingly, together accounting for over half of all names in a curated list (from 2023) of 100 recent start-ups, which was used as an example dataset in the study (aside from only 29% which used 'correct' spellings, and another 20% using wholly new terms (neologisms) or proper names).

The categories of brand misspellings, as defined in the study (and illustrated with examples taken from it) were:

  • Compound - combining words together by (just) removing intervening spaces (e.g. 'AutoZone')
  • Lengthening - adding or repeating a letter (e.g. 'Mixx')
  • Foreign - substituting a character with one from an alternative language (e.g. 'Røde')
  • Letteronym - substituting a letter for a word or part of a word (e.g. 'La-Z-Boy')
  • Portmanteau - blending two or more other words (and removing parts of the component words), to create a new term (e.g. 'Duracell')
  • Phonetic - using a misspelling pronounced in the same way as the ‘intended’ term (e.g. 'Froot Loops')
  • Abridgment - shortening the term through the removal of one or more letters (e.g. 'Crumbl')
  • Alphanumeric - substituting a numeral for a word or part of a word (e.g. '4ward')
  • Leetspeak - substituting a numeral or special character for a visually similar letter (but with no modification to pronunciation) (e.g. 'E11EVEN')

Another key idea from the study is the fact that not all types of misspelling elicit similar responses; some decrease the 'processing fluency' of readers / consumers more than others - for example, abridgements, alphanumerics, and ‘leetspeak’ are generally found to be harder to process than compounds and lengthening. More predictably, fluency was also found to be decreased further in cases where there are higher degrees of misspelling (of a particular type). Other factors are also important, such as the effect on processing fluency of the proximity of an 'incorrect' character to the start of the word. This concept is related to a similar idea which is familiar from previous work on mark similarity measurement[9]

Counteracting the adverse effects of a misspelling may be other factors which can aid in conceptual fluency for the brand name in question, such as similarity to other familiar terms with well-understood meanings, the use of additional visual cues in the associated brand presentation, or the alignment of the spelling with characteristics such as the owner’s name, or the product or business type (e.g. a use of 'Quik' for 'quick', invoking an association with the underlying sentiment (quickness) through a shortening of the word; the use of the 'oo' for 'Froot Loops', resembling the shape of the actual cereal; or the use of a term such as 'Scentimental' for a brand in a relevant product area, such as a florist). These types of enhancement are found to positively influence brand attitudes, level of brand preference, and favourability to word-of-mouth recommendations, and can be particularly effective if other subjective criteria are also met (such as through the use of a misspelling considered to be 'fun' or 'cute'). 

However, there are likely also to be other factors which must be considered, such as whether the industry area of the brand in question is traditionally perceived to be associated with trust or accuracy (for example), in which case a misspelled brand name may be deemed less acceptable.

Conclusions

An understanding of the nature of, and of consumer reactions to, misspellings in brand names, can be a key component of an effective marketing strategy, particularly when a new brand name is being selected. 

The use of a misspelling can be a compelling solution to problems associated with pre-existing IP rights and poor domain name availability, but can have an adverse effect on brand perception. 

However, it appears to be the case that an optimum approach can be the selection of a 'minor' misspelling, combined with the use of other tactics to counteract its potentially negative impact (in terms of ease of word processing and customer perception). It is also worth noting that the 'degree' of misspelling can be quantified using previously explored ideas on mark similarity measurement.

Examples of successful approaches to improve the effectiveness of a candidate brand name might include the careful selection of a particular misspelling, or the use of appropriate visual branding elements, to convey other aspects of the intended brand values or message.

References

[1] J.P. Costello, J. Walker and R.W. Reczek (2023). "Choozing" the Best Spelling: Consumer Response to Unconventionally Spelled Brand Names. J. Marketing, 87 (6), pp. 889-905, https://doi.org/10.1177/00222429231162367. (Available at: https://journals.sagepub.com/doi/abs/10.1177/00222429231162367)

[2] L.W. Smith and A. Abell (2025). The Art of Misspelling: Unraveling the Diverging Effects of Misspelled Brand Names on Consumer Responses. J. Consumer Research, ucaf020, https://doi.org/10.1093/jcr/ucaf020. (Available at: https://academic.oup.com/jcr/advance-article-abstract/doi/10.1093/jcr/ucaf020/8106524)

[3] https://domainnamewire.com/2025/08/20/new-research-reveals-which-misspelled-brand-names-work-best/

[4] https://www.linkedin.com/pulse/overview-brandable-domain-name-discovery-techniques-so3ye/

[5] https://circleid.com/posts/20240911-further-explorations-in-brandable-domain-names-sensational-spellingz

[6] https://circleid.com/posts/availability-analysis-of-brandable-variant-string-domain-names

[7] https://www.iamstobbs.com/opinion/you-spelled-it-wrong-exploring-typo-domains

[8] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 7: 'Creation of deceptive URLs'

[9] https://www.linkedin.com/posts/dnbarnett2001_measuring-the-similarity-of-marks-activity-7331669662260224000-rh-R/

This article was first published on 28 August 2025 at:

https://www.iamstobbs.com/insights/wyrdbrandz-misspelling-in-brand-names-is-it-an-effective-marketing-tactic

Thursday, 14 August 2025

Further explorations in clustering - use of Google advertising tracking links

Part of the 'Patterns in Brand Monitoring: Brand Protection Data is Beautiful' series of articles[1,2,3,4]

Introduction

'Clustering' in brand protection is the process of discovering features shared in common between distinct findings (such as websites), as a means of establishing a connection between those results. In general terms, this type of analysis is beneficial as it allows for the identification of the most significant infringements (i.e. those associated with extensive networks of activity) and can provide investigative insights into the underlying entity(ies) (i.e. the owners / administrators of the content in question).   

Our previous discussion on 'clustering' analysis considered the case of e-mail addresses as a potential basis for establishing a link[5], and it is similarly possible to use other features, such as telephone numbers (though this is made more complicated by the wide range of formats in which the details can be formatted) or hyperlinks to (for example) associated social-media pages. It is also worth noting that the general process of establishing data clusters is one compelling potential application for AI functionality, theoretically able to address issues such as being able to interpret data which may be presented in a wide range of different ways and contexts[6].

In this new article, we consider the case of Google tracking links as a suitable feature for establishing connections between websites. The analysis in this study is based specifically on the Google Tag Manager system, which is used for functionality relating to website tracking and marketing / advertising, and utilises links incorporating identity ('ID') codes unique to the account of the owner of the website[7,8]. Previous analysis has established that many infringers tend to utilise the same ID code across large numbers of sites under their operation, to monitor the performance of their portfolio, rather than using a unique code / Google account for each site. Accordingly, the ability to identify the same tracker code on multiple different sites provides a means for a definitive determination of a connection between these sites to be established.

The analysis consists of utilising 'scraper' functionality to identify these links in the HTML source code of websites of potential interest (for those examples utilising the Google tracking functionality) and extracting the user-specific tracker-ID codes from them. We consider links of the general form googletagmanager[.]com/***?id=XXXXX (where '***' is an arbitrary string of characters, and 'XXXXX' is the tracker ID-code, written as an alphanumeric string).

Furthermore, open-source databases such as that offered by publicwww[.]com make it possible to carry out wider searches for other appearances of the same code, and therefore build a bigger picture of infringer activity.

Analysis

The analysis considers the same set of around 4,500 websites considered in the previous article; these are brand-specific domain names resolving to live web content, pertaining to particular a fashion brand.

In this new study, Google tracking links were identified on around 900 of the sites in question, and 50 of the identified tracking-code IDs were found on multiple (i.e. more than just one) sites, thereby providing criteria for establishing clusters. 

Upon deeper analysis, certain clusters turn out not to reveal any significant insights - for example, one of the tracking codes, which actually appears on 167 distinct sites, seems just relate to a particular web-hosting service provider (whose parking page appears in association with many of the domain names in question), rather than actually pertaining to the underlying website owners

However, many of the clusters do seem to reveal meaningful links, such as a group of 14 sites (the largest other cluster in the dataset) all featuring the same tracking code, but which would not otherwise easily have been known to be linked (Figures 1 and 2). Information from publicwww[.]com shows that this same code actually appears on over 232,000 distinct websites across the wider Internet.

Figure 1: (Redacted) examples of websites from the cluster of 14 all determined to be linked by virtue of the use of the same tracking code

Figure 2: (Redacted) website source code snippet present on all sites shown in Figure 1

In order to carry out deeper dives into the data, the dataset can be processed in a range of different ways to reveal and visualise the nature of the clusters. One convenient first step is the production of an 'adjacency matrix' (Figure 3) for the sets of sites (vertical axis) and distinct tracking codes (horizontal axis) in the dataset, in which a row/column intersection is marked with a '1' (red highlighting) if the code appears on the site in question, and '0' otherwise. Even from this raw data, some insights can be drawn, such as the identification of the large cluster associated with the tracking code shown third from the right in the screenshot, for which many of the row entries (corresponding to distinct associated websites) are highlighted in red.

Figure 3: Screenshot of the 'adjacency matrix' for the distinct sites and tracking codes present within any of the clusters in the dataset

This matrix can then be used as the basis for creating further visualisations of the data. For example, a number of standard Python libraries[9] can be used for the creation of visual 'networks' showing the connections within the dataset (Figures 4 and 5). These types of clusters show us that the websites in question in each case (represented by the nodes in blue) are all associated with each other, and could potentially be addressed in single bulk enforcement actions, thereby building efficiencies into the takedown process.

Figure 4: (Obfuscated[10]) visualisation of the cluster of 14 sites from which the examples in Figure 1 were taken (websites shown as blue nodes, tracking codes as green nodes)

Figure 5: (Obfuscated) examples of other clusters within the dataset which are of particular interest because of the presence of multiple interconnections between the sites / tracking codes in question (websites shown as blue nodes, tracking codes as green nodes)

Conclusion

The concept of clustering is a key component of the analysis process for websites and other results identified through a programme of brand monitoring. As part of a holistic brand protection initiative, it can help identify key infringers for prioritised action and enforcement, and help identify other linked content, through which a fuller picture of the underlying entities and their associated activities can be established.

The use of Google advertising tracking codes is a compelling basis for identifying connections, as they are generally specific to a particular user, are frequently utilised across multiple different sites in the portfolio, appear to be relatively ubiquitous across web content generally, can be readily extracted from the source code of webpages, and can often be tied to additional related material through the use of insights drawn from open-source databases.

References

[1] https://www.linkedin.com/pulse/brand-protection-data-beautiful-david-barnett-c66be/

[2] https://www.linkedin.com/pulse/brand-protection-data-still-beautiful-part-1-year-domains-barnett-juwhe/

[3] https://www.linkedin.com/pulse/brand-monitoring-data-niblet-5-law-firm-scam-websites-david-barnett-ap5de/

[4] https://www.iamstobbs.com/insights/notorious-ip-addresses-and-initial-steps-towards-the-formulation-of-an-overall-threat-score-for-websites

[5] https://www.iamstobbs.com/insights/e-mail-address-extraction-from-webpages-a-quick-case-study-in-result-clustering

[6] https://circleid.com/posts/braive-new-world-part-1-brand-protection-clustering-as-a-candidate-task-for-the-application-of-ai-capabilities

[7] https://support.google.com/tagmanager/answer/6102821?hl=en

[8] https://www.analyticsmania.com/post/google-tag-manager-vs-google-analytics/

[9] The figures in this study utilise the Python libraries NetworkX (https://networkx.org/; A.A. Hagberg, D.A. Schult and P.J. Swart (2008). "Exploring network structure, dynamics, and function using NetworkX". In: Proceedings of the 7th Python in Science Conference (SciPy2008), G. Varoquaux, T. Vaught and J. Millman (Eds.) (Pasadena, CA USA), pp. 11–15.) and Matplotlib (https://matplotlib.org/). 

[10] In the visualisations of the clusters, the brand name (as it appears in the domain names) is replaced by the string '[brand]', and an encoded ('hashed') form of the tracking codes (which generally exist in the raw data in the form 'GTM-XXXXX', 'G-XXXXX', 'AW-XXXXX' or 'UA-XXXXX', where 'XXXXX' is an alphanumeric string) is shown in each case.

This article was first published on 14 August 2025 at:

https://www.iamstobbs.com/insights/further-explorations-in-clustering-use-of-google-advertising-tracking-links

AI and web search (Part 2) - Optimising for web search in a world of AI

Notes from the B2B Marketing Live UK event, 19 - 20-Nov-2025, ExCel, London Following on from our recent work [1] considering the general s...