Thursday, 28 August 2025

WyrdBrandz: Misspelling in brand names - is it an effective marketing tactic?

Introduction

In the modern world of branding, consumers are becoming increasingly familiar with the use of 'wacky' misspellings (sometimes described as 'sensational spellings'), with familiar examples including Flickr, Reddit, Tumblr, and many more. 

There are many reasons why this approach to selecting a brand name can be an effective one, including the facts that these types of misspellings can attract customer attention and aid memorability, and can make it easier obtain a protective trademark and secure an available domain name (as compared to the use of a dictionary word or proper name, for example). In conflict with these considerations, however, are the facts that misspellings generally are understood[1] to generate adverse brand attitudes (including questions regarding brand sincerity) and are frequently perceived as a marketing 'gimmick'. 

A deep analysis of the wider world of consumer attitude to 'misspelled' brands has recently been published by researchers from the Universities of Arkansas and Tennessee[2,3]. The study finds that the negative impact associated with the use of a minor misspelling can be offset through the careful choice of a particular spelling (or other branding features) to aid with brand-name interpretation by consumers. These types of insight are key to success in the areas of brand selection and marketing and, by extension, to considerations such as domain name registration (an area which was covered in previous work, providing overviews of techniques applicable to the discovery of available unregistered 'brandable' domain names[4,5,6]).

It is also important to note that the types of misspelled names considered here are completely distinct from other cases involving the deliberate use of deceptive (brand- or domain) names similar to those of third-party established trusted brands, for the purposes of impersonation and fraud[7,8].

Definitions and a deeper dive

The study of consumer reactions to misspellings in brand names is based fundamentally on the concepts of linguistic fluency (i.e. the ease of processing written content into language) and conceptual fluency (the ease with which associated meaning is brought to mind). Central ideas include the propositions that, overall, consumers process names more effectively when they are more similar to familiar terms (such as dictionary words), and that greater processing fluency tends (in general) to lead to improved brand perception. Lower degrees of 'orthographic' similarity to familiar strings can also be counteracted (to a degree) through the use of phonetic cues (i.e. those which are apparent when the brand name is sounded out) to the intended meaning. 

A core idea within this overall framework is that the use of only minor misspellings can have a small or negligible negative impact on brand perception, which can then be more than compensated through the clever application of other marketing-related mediating factors. This approach thereby allows the brand owner to take advantage of the other attractive features of misspellings, such as greater trademark and domain name availability.

As part of the analysis presented in the recent study, the types of misspellings used by brands were categorised into a set of 'types' - strikingly, together accounting for over half of all names in a curated list (from 2023) of 100 recent start-ups, which was used as an example dataset in the study (aside from only 29% which used 'correct' spellings, and another 20% using wholly new terms (neologisms) or proper names).

The categories of brand misspellings, as defined in the study (and illustrated with examples taken from it) were:

  • Compound - combining words together by (just) removing intervening spaces (e.g. 'AutoZone')
  • Lengthening - adding or repeating a letter (e.g. 'Mixx')
  • Foreign - substituting a character with one from an alternative language (e.g. 'Røde')
  • Letteronym - substituting a letter for a word or part of a word (e.g. 'La-Z-Boy')
  • Portmanteau - blending two or more other words (and removing parts of the component words), to create a new term (e.g. 'Duracell')
  • Phonetic - using a misspelling pronounced in the same way as the ‘intended’ term (e.g. 'Froot Loops')
  • Abridgment - shortening the term through the removal of one or more letters (e.g. 'Crumbl')
  • Alphanumeric - substituting a numeral for a word or part of a word (e.g. '4ward')
  • Leetspeak - substituting a numeral or special character for a visually similar letter (but with no modification to pronunciation) (e.g. 'E11EVEN')

Another key idea from the study is the fact that not all types of misspelling elicit similar responses; some decrease the 'processing fluency' of readers / consumers more than others - for example, abridgements, alphanumerics, and ‘leetspeak’ are generally found to be harder to process than compounds and lengthening. More predictably, fluency was also found to be decreased further in cases where there are higher degrees of misspelling (of a particular type). Other factors are also important, such as the effect on processing fluency of the proximity of an 'incorrect' character to the start of the word. This concept is related to a similar idea which is familiar from previous work on mark similarity measurement[9]

Counteracting the adverse effects of a misspelling may be other factors which can aid in conceptual fluency for the brand name in question, such as similarity to other familiar terms with well-understood meanings, the use of additional visual cues in the associated brand presentation, or the alignment of the spelling with characteristics such as the owner’s name, or the product or business type (e.g. a use of 'Quik' for 'quick', invoking an association with the underlying sentiment (quickness) through a shortening of the word; the use of the 'oo' for 'Froot Loops', resembling the shape of the actual cereal; or the use of a term such as 'Scentimental' for a brand in a relevant product area, such as a florist). These types of enhancement are found to positively influence brand attitudes, level of brand preference, and favourability to word-of-mouth recommendations, and can be particularly effective if other subjective criteria are also met (such as through the use of a misspelling considered to be 'fun' or 'cute'). 

However, there are likely also to be other factors which must be considered, such as whether the industry area of the brand in question is traditionally perceived to be associated with trust or accuracy (for example), in which case a misspelled brand name may be deemed less acceptable.

Conclusions

An understanding of the nature of, and of consumer reactions to, misspellings in brand names, can be a key component of an effective marketing strategy, particularly when a new brand name is being selected. 

The use of a misspelling can be a compelling solution to problems associated with pre-existing IP rights and poor domain name availability, but can have an adverse effect on brand perception. 

However, it appears to be the case that an optimum approach can be the selection of a 'minor' misspelling, combined with the use of other tactics to counteract its potentially negative impact (in terms of ease of word processing and customer perception). It is also worth noting that the 'degree' of misspelling can be quantified using previously explored ideas on mark similarity measurement.

Examples of successful approaches to improve the effectiveness of a candidate brand name might include the careful selection of a particular misspelling, or the use of appropriate visual branding elements, to convey other aspects of the intended brand values or message.

References

[1] J.P. Costello, J. Walker and R.W. Reczek (2023). "Choozing" the Best Spelling: Consumer Response to Unconventionally Spelled Brand Names. J. Marketing, 87 (6), pp. 889-905, https://doi.org/10.1177/00222429231162367. (Available at: https://journals.sagepub.com/doi/abs/10.1177/00222429231162367)

[2] L.W. Smith and A. Abell (2025). The Art of Misspelling: Unraveling the Diverging Effects of Misspelled Brand Names on Consumer Responses. J. Consumer Research, ucaf020, https://doi.org/10.1093/jcr/ucaf020. (Available at: https://academic.oup.com/jcr/advance-article-abstract/doi/10.1093/jcr/ucaf020/8106524)

[3] https://domainnamewire.com/2025/08/20/new-research-reveals-which-misspelled-brand-names-work-best/

[4] https://www.linkedin.com/pulse/overview-brandable-domain-name-discovery-techniques-so3ye/

[5] https://circleid.com/posts/20240911-further-explorations-in-brandable-domain-names-sensational-spellingz

[6] https://circleid.com/posts/availability-analysis-of-brandable-variant-string-domain-names

[7] https://www.iamstobbs.com/opinion/you-spelled-it-wrong-exploring-typo-domains

[8] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 7: 'Creation of deceptive URLs'

[9] https://www.linkedin.com/posts/dnbarnett2001_measuring-the-similarity-of-marks-activity-7331669662260224000-rh-R/

This article was first published on 28 August 2025 at:

https://www.iamstobbs.com/insights/wyrdbrandz-misspelling-in-brand-names-is-it-an-effective-marketing-tactic

Thursday, 14 August 2025

Further explorations in clustering - use of Google advertising tracking links

Part of the 'Patterns in Brand Monitoring: Brand Protection Data is Beautiful' series of articles[1,2,3,4]

Introduction

'Clustering' in brand protection is the process of discovering features shared in common between distinct findings (such as websites), as a means of establishing a connection between those results. In general terms, this type of analysis is beneficial as it allows for the identification of the most significant infringements (i.e. those associated with extensive networks of activity) and can provide investigative insights into the underlying entity(ies) (i.e. the owners / administrators of the content in question).   

Our previous discussion on 'clustering' analysis considered the case of e-mail addresses as a potential basis for establishing a link[5], and it is similarly possible to use other features, such as telephone numbers (though this is made more complicated by the wide range of formats in which the details can be formatted) or hyperlinks to (for example) associated social-media pages. It is also worth noting that the general process of establishing data clusters is one compelling potential application for AI functionality, theoretically able to address issues such as being able to interpret data which may be presented in a wide range of different ways and contexts[6].

In this new article, we consider the case of Google tracking links as a suitable feature for establishing connections between websites. The analysis in this study is based specifically on the Google Tag Manager system, which is used for functionality relating to website tracking and marketing / advertising, and utilises links incorporating identity ('ID') codes unique to the account of the owner of the website[7,8]. Previous analysis has established that many infringers tend to utilise the same ID code across large numbers of sites under their operation, to monitor the performance of their portfolio, rather than using a unique code / Google account for each site. Accordingly, the ability to identify the same tracker code on multiple different sites provides a means for a definitive determination of a connection between these sites to be established.

The analysis consists of utilising 'scraper' functionality to identify these links in the HTML source code of websites of potential interest (for those examples utilising the Google tracking functionality) and extracting the user-specific tracker-ID codes from them. We consider links of the general form googletagmanager[.]com/***?id=XXXXX (where '***' is an arbitrary string of characters, and 'XXXXX' is the tracker ID-code, written as an alphanumeric string).

Furthermore, open-source databases such as that offered by publicwww[.]com make it possible to carry out wider searches for other appearances of the same code, and therefore build a bigger picture of infringer activity.

Analysis

The analysis considers the same set of around 4,500 websites considered in the previous article; these are brand-specific domain names resolving to live web content, pertaining to particular a fashion brand.

In this new study, Google tracking links were identified on around 900 of the sites in question, and 50 of the identified tracking-code IDs were found on multiple (i.e. more than just one) sites, thereby providing criteria for establishing clusters. 

Upon deeper analysis, certain clusters turn out not to reveal any significant insights - for example, one of the tracking codes, which actually appears on 167 distinct sites, seems just relate to a particular web-hosting service provider (whose parking page appears in association with many of the domain names in question), rather than actually pertaining to the underlying website owners

However, many of the clusters do seem to reveal meaningful links, such as a group of 14 sites (the largest other cluster in the dataset) all featuring the same tracking code, but which would not otherwise easily have been known to be linked (Figures 1 and 2). Information from publicwww[.]com shows that this same code actually appears on over 232,000 distinct websites across the wider Internet.

Figure 1: (Redacted) examples of websites from the cluster of 14 all determined to be linked by virtue of the use of the same tracking code

Figure 2: (Redacted) website source code snippet present on all sites shown in Figure 1

In order to carry out deeper dives into the data, the dataset can be processed in a range of different ways to reveal and visualise the nature of the clusters. One convenient first step is the production of an 'adjacency matrix' (Figure 3) for the sets of sites (vertical axis) and distinct tracking codes (horizontal axis) in the dataset, in which a row/column intersection is marked with a '1' (red highlighting) if the code appears on the site in question, and '0' otherwise. Even from this raw data, some insights can be drawn, such as the identification of the large cluster associated with the tracking code shown third from the right in the screenshot, for which many of the row entries (corresponding to distinct associated websites) are highlighted in red.

Figure 3: Screenshot of the 'adjacency matrix' for the distinct sites and tracking codes present within any of the clusters in the dataset

This matrix can then be used as the basis for creating further visualisations of the data. For example, a number of standard Python libraries[9] can be used for the creation of visual 'networks' showing the connections within the dataset (Figures 4 and 5). These types of clusters show us that the websites in question in each case (represented by the nodes in blue) are all associated with each other, and could potentially be addressed in single bulk enforcement actions, thereby building efficiencies into the takedown process.

Figure 4: (Obfuscated[10]) visualisation of the cluster of 14 sites from which the examples in Figure 1 were taken (websites shown as blue nodes, tracking codes as green nodes)

Figure 5: (Obfuscated) examples of other clusters within the dataset which are of particular interest because of the presence of multiple interconnections between the sites / tracking codes in question (websites shown as blue nodes, tracking codes as green nodes)

Conclusion

The concept of clustering is a key component of the analysis process for websites and other results identified through a programme of brand monitoring. As part of a holistic brand protection initiative, it can help identify key infringers for prioritised action and enforcement, and help identify other linked content, through which a fuller picture of the underlying entities and their associated activities can be established.

The use of Google advertising tracking codes is a compelling basis for identifying connections, as they are generally specific to a particular user, are frequently utilised across multiple different sites in the portfolio, appear to be relatively ubiquitous across web content generally, can be readily extracted from the source code of webpages, and can often be tied to additional related material through the use of insights drawn from open-source databases.

References

[1] https://www.linkedin.com/pulse/brand-protection-data-beautiful-david-barnett-c66be/

[2] https://www.linkedin.com/pulse/brand-protection-data-still-beautiful-part-1-year-domains-barnett-juwhe/

[3] https://www.linkedin.com/pulse/brand-monitoring-data-niblet-5-law-firm-scam-websites-david-barnett-ap5de/

[4] https://www.iamstobbs.com/insights/notorious-ip-addresses-and-initial-steps-towards-the-formulation-of-an-overall-threat-score-for-websites

[5] https://www.iamstobbs.com/insights/e-mail-address-extraction-from-webpages-a-quick-case-study-in-result-clustering

[6] https://circleid.com/posts/braive-new-world-part-1-brand-protection-clustering-as-a-candidate-task-for-the-application-of-ai-capabilities

[7] https://support.google.com/tagmanager/answer/6102821?hl=en

[8] https://www.analyticsmania.com/post/google-tag-manager-vs-google-analytics/

[9] The figures in this study utilise the Python libraries NetworkX (https://networkx.org/; A.A. Hagberg, D.A. Schult and P.J. Swart (2008). "Exploring network structure, dynamics, and function using NetworkX". In: Proceedings of the 7th Python in Science Conference (SciPy2008), G. Varoquaux, T. Vaught and J. Millman (Eds.) (Pasadena, CA USA), pp. 11–15.) and Matplotlib (https://matplotlib.org/). 

[10] In the visualisations of the clusters, the brand name (as it appears in the domain names) is replaced by the string '[brand]', and an encoded ('hashed') form of the tracking codes (which generally exist in the raw data in the form 'GTM-XXXXX', 'G-XXXXX', 'AW-XXXXX' or 'UA-XXXXX', where 'XXXXX' is an alphanumeric string) is shown in each case.

This article was first published on 14 August 2025 at:

https://www.iamstobbs.com/insights/further-explorations-in-clustering-use-of-google-advertising-tracking-links

Monday, 4 August 2025

E-mail address extraction from webpages: a quick case study in result 'clustering'

Introduction

The concept of result 'clustering' - that is, the ability to establish connections between online brand monitoring findings not previously known to be linked - has been discussed previously as a key element of the analysis process in brand protection. 

It can allow the identification of key targets for further investigation or enforcement, and assist in building a fuller picture of the identity and activities of the entity(ies) behind the web-content in question, as part of an open-source intelligence (OSINT)-style investigative approach[1,2,3].

In this article, we focus specifically on the case of e-mail addresses as the data points on which clustering analysis can be based. The presented findings are derived from a process of data analysis involving the automated extraction of contact e-mail addresses from a series of webpages of potential interest, and the associated discussion shows how insights can be derived from the dataset.

Analysis

The dataset used in this case study is a set of domains of potential interest to a fashion brand, as identified through analysis of domain name zone files, which are data files containing the names of all registered domains across each TLD (top-level domain, or domain extension). The search was run using an analysis script configured to identify all domains containing the name of the brand in question, thereby simulating the process of collection of results by a full formal automated domain-monitoring service. 

For the brand under consideration (the name of which has simply been replaced, for confidentiality, by the string '[brand]' in all examples which follow), the initial searches generated over 16,000 brand-specific domain names of potential interest. Simple analysis techniques (as discussed in previous articles) can be used to carry out an initial stage of filtering and prioritisation of these results, to identify those sites most likely to be of interest. These techniques might typically include the calculation of 'risk scores' based on characteristics of the domains themselves, or of the content of any associated websites (in cases where a live site is present)[4,5]. This initial analysis allowed the production of a focused sub-dataset of around 4,500 domains most likely to be of greatest interest to the brand owner in question, based on the presence and prominence of the brand name and associated relevance keywords in the domain name itself and/or on the associated website.

The basic step of the subsequent analysis was to inspect the (HTML) content of each of the domains from the prioritised subset and (using an automated script) extract from the page any text-string(s) matching the format of an e-mail address (where present), with a view to identifying any contact addresses cited on each of the sites, and thereby identify any commonalities or similarities in usage.

At least one e-mail address was identified in the content of just over 1,000 of the sites in question (focusing specifically on the homepages of the sites in each case). The analysis focused on those e-mail addresses in which the 'host' part of the e-mail address (i.e. the part after the '@') was different from the domain name of the particular website on which the e-mail address was identified (deemed to be 'site-specific' contact details). 

The most obvious links which can be established are those cases in which the same e-mail address was found to be used on more than one distinct site in the dataset, which may otherwise not obviously have been known to be linked. 

In some of these cases, the distinct sites on which a particular e-mail address was found were themselves found to share a common SLD (second-level name, i.e. the part of the domain name to the left of the dot), such that it would have been relatively straightforward to establish a link even in the absence of the common e-mail address. Some such examples from the dataset (with domain names and e-mail addresses obfuscated in each case) include:

  • [brand]bag.vip and [brand]bag.store - e-mail address: camarendale9XXX[at]gmail.com
  • [brand]vix.com and [brand]vix.shop - e-mail address: ryanmi0XXX[at]gmail.com

However, in other cases, the common e-mail address may be the only basis on which a link between the sites in question could easily be established, e.g.:

  • my[brand]photos.com and omaha[brand].com - e-mail address: whatsyour[brand][at]gmail.com (Figure 1)
  • i-[brand]lightingonline.com and [brand]malls.com - e-mail address: 2853583XXX[at]qq.com 

Figure 1: Screenshots from two sites found to be linked on the basis of the use of a common e-mail address

In other cases, the 'host' part of the common e-mail address may also reveal the identity of an additional domain name which is linked to the first two, e.g.:

  • art-[brand].com and art[brand]dz.com - e-mail address: contact[at]art[brand].com
  • XX[brand]nails.eu and XX[brand]usa.com - e-mail addresses: helpdesk[at]XX[brand]nails.com and james[at]XX[brand]nails.com
  • e-casa[brand].com and casa[brand]contract.gr - e-mail address: info[at]casa[brand].gr
  • [brand]zeitde.com and [brand]zeitde.shop - e-mail address: info[at][brand]zeit.com
  • [brand]tailorhk.com and [brand]tailors.com - e-mail address: [brand][at][brand]tailor.com
  • n-[brand].com and n-[brand].net - e-mail address: care[at]usaglobalXXX.org
  • ceramica[brand].com and ceramica[brand].it - e-mail address: info[at]gruppobarXXX.com
  • [brand]movies.com and [brand]sf.com - e-mail address: 94115adam[at]cinemaXXX.com

It may also then be possible to determine further information on the underlying entity, by carrying out further searches for other online references to the common pieces of information (i.e. OSINT research). It is, however, worth noting that some e-mail addresses appearing on multiple sites may simply relate to (say) a particular service provider which just happens to have been used by the owners of each the websites in question, but where the sites themselves may be otherwise unrelated. One such example might be the presence of contact details pertaining to the associated domain registrar, such as (from the dataset used) filler[at]godaddy.com or support[at]goldenname.com. This point highlights the importance of reviewing individual findings for relevance and significance, before asserting the presence of a definitive link.

In certain cases where an e-mail username (i.e. just the part of the e-mail address to the left of the '@') is particularly distinctive, searches based on this characteristic alone might be sufficient to establish a link.

Finally, it is also worth noting that the identity of the e-mail address provider can yield its own insights in some cases, with addresses from webmail providers such as yahoo.com and outlook.com, or messaging services such as qq.com, found less frequently to be utilised by larger legitimate businesses.

Conclusion

This brief case study has highlighted the potential usefulness of e-mail addresses - features which are essentially unique to a particular entity, and which can be extracted directly from the content of a website through the use of a simple script or 'scraper' - as a means of establishing links between results. The identification of connections between findings can be a key part of the process of identifying serial infringers, or entities warranting prioritised analysis, and can serve as a 'start-points' for deeper open-source investigations into entities and their associated activities. 

Beyond this, insights drawn from the e-mail addresses themselves can also feed into more general algorithms used for quantifying the overall level of potential risk (e.g. non-authenticity) of a website. Characteristics such as the use of e-mail addresses from webmail providers and instant messaging services, for example, are less usually associated with mainstream corporate entities, and can be indicators of higher risk. 

References

[1] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 6: 'Result clustering'

[2] https://circleid.com/posts/braive-new-world-part-1-brand-protection-clustering-as-a-candidate-task-for-the-application-of-ai-capabilities

[3] https://www.iamstobbs.com/insights/using-clustering-and-investigation-techniques-to-connect-and-identify-scam-law-firm-websites

[4] https://circleid.com/posts/towards-a-generalised-threat-scoring-framework-for-prioritising-results-from-brand-monitoring-programmes

[5] https://www.iamstobbs.com/insights/exploring-a-domain-scoring-system-with-tricky-brands

This article was first published on 31 July 2025 at:

https://www.iamstobbs.com/insights/e-mail-address-extraction-from-webpages-a-quick-case-study-in-result-clustering


Revisiting the calculation of brand protection return-on-investment

by David Barnett, Richard Ferguson and Sheena Yonker August 2025 saw the publication of INTA's Anticounterfeiting Committee Policy Globa...