Monday, 9 December 2024

Availability analysis of brandable variant-string domain names

by David Barnett and Matt Duchesne

Introduction

For any entity looking to launch a new company or other initiative, a primary requirement is often the selection of an appropriate brand name and the acquisition of a relevant associated domain name. In light of the increasing shortage of short, unregistered memorable names on popular domain name extensions (TLDs), many organisations are choosing to adopt novel or invented brand names and/or consider the use of alternative TLDs[1,2].

One approach which has increasingly been utilised is the use of 'sensational spellings' (i.e. 'wacky' misspellings) of dictionary terms as brand names[3]. In this study, we outline a new approach for identifying variant spellings of favoured brand terms, which are available for registration as domain names (on a specific TLD of interest).

The methodology is based on the principle that a potential registrant may have a particular character string or industry-related keyword in mind as a brand-name 'template' (i.e. a 'seed' string for the search), and is interested in identifying sensationally-spelled variants which are available for registration (assuming that the exact match to the name in question is not available).

Variant domain name generation

The basic algorithm (as developed by UnregisteredGems.com) behind the generation of the variants incorporates the following elements:

  • Allowing the replacement of characters or groups of characters with alternatives which are phonetically similar (e.g. 'c' with 'k', or 's' with 'z')
  • Allowing any vowel (or, optionally, any character) to be excluded from the string
  • Allowing the replacement of repeated (double) characters with single versions
  • Allowing any character to be repeated (i.e. doubled)

Other elements of the algorithm (such as the addition of prefixes such as 'the-' or suffixes such as '-ify', '-ize' or '-able'[4,5], or the appending of an 's' or 'z' (to form ('pseudo-')plurals)) are optional, and can be switched on or off, as required. It is also generally appropriate to exclude specific categories of 'non-favourable' variants, such as strings with any triple letter (three consecutive repeats), or particular combinations of characters deemed to be 'bad' (such as any triphthong consisting only of the characters 'c', 'k' and 'q') (assuming that no match to these combinations was present in the original 'seed' string).

It is important to note that the approach is intended to be used with invented brands or dictionary terms, and not with pre-existing protected intellectual property. Domains utilising variants of trademarked terms would generally be considered to be instances of cyber- or typosquatting and, accordingly, could potentially be infringing. For this reason, certain other types of variation (such as the use of adjacent-key substitutions) have not been incorporated into the final version of the generation algorithm, since these would generate domains for which the web traffic would be likely primarily to arise from mis-typed versions of known and trusted website addresses.

Filtering and sorting the output

Once the candidate variant domain names have been generated, it is then necessary to (a) determine whether each name is available for registration[6] (on the specified TLD); and (b) rank the available variants according to their desirability.

The version of the ranking score utilised in this study incorporates two elements. The first element considers the closeness (i.e. degree of similarity) of the variant string to the initial ('seed') string. This is quantified using the visual (i.e. spelling) similarity metric for pairs of strings, as proposed by Barnett (2024)[7].

The second component of the ranking score is a measure of the 'readability' of the variant string, using an algorithm developed by Domai.io. This calculation considers the degree of alternation between consonants and vowels, the overall balance between consonants and vowels, and the total length of the string (with shorter strings being favoured).

The final ranking ('desirability') score (normalised to sit in the range 0 to 100) is then derived from the combination of these two elements. It is also possible to incorporate more sophisticated components into the calculation, such as the use of the Barnett (2024) aural (pronunciation) similarity metric, and the determination of phonotactic acceptability score[8,9] (i.e. the degree of resemblance to the corpus of words within a language - in this case, English) but, because of the increased computational overhead in carrying out phonetic analysis, these components have been excluded from the final version of the algorithm used in this study.

Case studies of example 'seed' search terms

Table 1 shows the total number of candidate variant domain names generated by the algorithm for each of the 'seed' strings considered in the study (with the exact-match domain in each case retained in the statistics shown), and the total number of these found to be available for registration in each case (as of the date of analysis), considering only .com domains.

'Seed' string
                                
No. candidate variants
                                
No. variants available for registration
                                
% variants available for registration
                                
  fintech 122 86 70.5%
  brandio 66 43 65.2%
  clarity 117 74 63.2%
  zoomify 87 77 88.5%
  grubhub 66 48 72.7%
  appify 69 48 69.6%
  fizzle 84 62 73.8%
  gloop 36 9 25.0%
  tiktik 197 164 83.2%
  example 129 95 73.6%
  zinga 66 29 43.9%
  sportstoday 111 103 92.8%

Table 1: Total numbers of candidate variant domain names, and numbers of variants available for registration (as .com domains), for each of the 'seed' strings.

For 'fintech' (for example) - i.e. considering potential names which may be of interest for a company looking to launch in the financial technology industry - the ranking scores assigned to the generated variants range from 85.2 (for fintec.com; unavailable for registration) down to 64.0 (fyntechz.com; available). As of the date of analysis, the highest-scoring unregistered (i.e. available) variant domain name was found to be finteech.com (score 82.7) (though see also Footnote #6).

Conclusion and discussion

This overview shows how simple applications of ideas outlined in previous studies can be used in the identification of available domain names which may be attractive from a brandability or monetisation point of view.

Another interesting observation from a more general analysis of unregistered domains is the potential for certain variant domains to receive organic traffic despite their unregistered status. For example, if a particular existing registered domain receives valuable traffic, it is likely that a close variant - such as those generated using the algorithm - may also capture significant traffic. This is particularly relevant for domains which incorporate high-interest keywords or popular phrases, and have a high percentage of available variants. In cases where these variants are subsequently registered, the traffic can potentially then be monetised through affiliate marketing or domain parking, or the domain can be held as a strategic asset for future brand development. In some cases, it may be desirable to register 'bundles' of variant domains, if the cost of registration is below the present value of the monetisation of these domains. This idea is not explored further in this study, but a detailed analysis of the monetisation of variants is a topic worthy of further analysis.

Acknowledgements

The overall methodology described in this study is currently under development as part of a joint initiative between Domai.io and UnregisteredGems.com.

References

[1] https://www.iamstobbs.com/availability-of-domains-ebook

[2] 'Patterns in Brand Monitoring' by D.N. Barnett (Business Expert Press, 2025), Chapter 9: 'Domain landscape analysis' 

[3] https://circleid.com/posts/20240911-further-explorations-in-brandable-domain-names-sensational-spellingz

[4] https://circleid.com/posts/20240916-unregistered-gems-part-3-keeping-your-ize-on-the-prize

[5] https://circleid.com/posts/unregistered-gems-part-4-other-brandable-domain-name-styles

[6] In this study, domain availability is determined on the basis of a whois look-up for the domain in question returning a 'not found' message. This does not necessarily mean the domain is actually available for registration in practice; in any commercial application, it would be necessary to carry out a full formal availability check. 

[7] https://www.linkedin.com/pulse/measuring-similarity-marks-overview-suggested-ideas-david-barnett-zo7fe/

[8] https://circleid.com/posts/20240903-unregistered-gems-identifying-brandable-domain-names-using-phonotactic-analysis

[9] We propose a framework where phonotactic acceptability score (PhA) can simply be calculated from the (more standard) phonotactic violation score (PhV), which takes a value anywhere from 0 to infinity, by defining as PhA = 100 × exp(-PhV/10), to give a value between 0 ('bad') and 100 ('good') 

This article was first published on 8 December 2024 at:

https://circleid.com/posts/availability-analysis-of-brandable-variant-string-domain-names

No comments:

Post a Comment

Unregistered Gems Part 6: Phonemizing strings to find brandable domains

Introduction The UnregisteredGems.com series of articles explores a range of techniques to filter and search through the universe of unregis...