Monday, 23 September 2024

Unregistered Gems Part 4: Other brandable domain-name styles

In the fourth in this series of articles exploring how the application of appropriate tools, algorithms and analysis techniques can identify the existence of attractive, unregistered brandable domain names, I consider the prevalence of a few additional styles of name.

Any identified domain names given as specific named examples in this article have been purchased as part of a 'test set' (all available at 'cost price' at the time of purchase); any associated valuations are as given by an automated, AI-driven tool.

Domain-ly-ify-ing the search

As for the case of 'ize' domains considered in the previous article in this series[1], keywords featuring certain other suffixes have also historically proved popular for use as brandable names. Two specific examples are domain names ending in '-ly' or '-ify'. As for 'ize' domains, the name structure can produce terms appearing similar to particular categories of grammatical terms (adverbs and verbs, respectively), which can provide an attractive sense of dynamism for a potential brand[2]. Both of these suffixes have been adopted extensively for a number of years[3], with well-known brands including Bitly, Reachly, Grammarly, Seekly, Talently, Brainly, Scopely, Leafly, Calendly, Musically, Openly, and Spotify, Bizify, Healthify, Expensify, Proposify, and many others. An additional aspect to the 'ly' names is that they are particularly amenable to construction using second-level name (SLD) / top-level domain (TLD) combinations (utilising the .ly ccTLD which technically pertains to Libya, but is one of many extensions which are commonly 're-requisitioned' for other use-cases[4]). However, in this analysis, I consider only .com names.

Whilst 'ly' names (for example) may appear superficially similar to adverbs, it seems that non-dictionary neologisms are more often preferred from a brandability point of view (as borne out by the examples presented above), perhaps both because of the relative unavailability of pure dictionary terms (i.e. true adverbs) as available domain names, and because of the general apparent appeal of newly-coined terms (in some ways analogous to 'sensationally spelled' terms as discussed previously[5]). The examples presented in this study have largely been selected with this trend in mind.

The first step in identifying available names is firstly to consider the set of registered names, through analysis of the (in this case, .com) zone file. As of the date of analysis (22-Sep-2024), there are 604,379 registered .com domains with names ending with 'ly' (comprising an alphabetical list from 000area-weekly[.]com to zzzzzfly[.]com), and 51,235 'ify' domains (from 00ssl-verify[.]com to zzzify[.]com). Unlike the cases of domains of a particular length, there are - of course - an infinite number of possible 'ly' or 'ify' domains. The analysis in this study therefore focuses on available domain names where the prefix is an attractive, brandable (generic or business-related) term, and the domain name as a whole is (relatively) short.

To this end, I consider (single-word) prefixes (i.e. the portion of the SLD prior to 'ly' or 'if'’) comprising either: any of the set of (34) business-related keywords[6] considered in the previous study of sensational spellings; any of the set of (999) most common English nouns[7], any of a set of (2,766) general business-vertical-related keywords drawn from various sources[8] (including the large set of descriptors used in Google business profiles[9,10]); and (for 'ify') any of a set of (96) common English adjectives[11]. For the 'ify' domains, I also consider cases where any final vowel on the prefix-term is optionally removed.

The analysis yielded a set of 1,068 potentially available 'ly' domains of interest, and 2,220 'ify' domains. Amongst these, some of the most appealing and/or brandable examples included alcoholicly[.]com (<$100), amusemently[.]com ($104), competitionly[.]com ($185), introductionly[.]com ($104), petrochemically[.]com (<$100), and breakfastify[.]com ($1,326).

The domains

Domains beginning with the definite article ('the') are also traditionally popular in branding, with the term considered to have the potential to add specificity (rather than genericism) to the name, and convey a sense of gravitas. Numerous instances of sales of 'the' domains at prices in excess of $1M dollars have been reported[12], and they may also comprise an attractive brandable alternative in light of the shortage of availability of dictionary-term (in isolation) domain names.

The set of registered .com domain names beginning with 'the' is somewhat larger than for the categories of names discussed in the previous section, with 3,264,507 registered names (from the--------------------------------------------------------line[.]com to thezzzzone[.]com, in addition to a number of examples featuring non-Latin characters) (including also examples where the term appears as a sub-string of a longer word such as 'theatre', 'there', 'thermo-', etc.).

The most obvious category of appealing names are those in which the second-level name consists of 'the' immediately followed by (just) a dictionary term. The analysis in this case focuses on the same lists of nouns and adjectives considered previously, plus any of a list of (45) common English single-word superlatives[13]. This first-stage analysis yielded the identification of ten candidate domains (i.e. those which are absent from the zone file) of potential interest, including thedifficulty[.]com ($1,598), theco-operation[.]com (<$100), theannoyed[.]com ($1,523), thewettest[.]com (<$100), and - forming a 'pair' which could potentially be sold together at a premium, offering additional brandability options - thefattest[.]com ($1,564) and thethinnest[.]com ($1,429).

Widening out the search, to those examples where the portion of the domain name after 'the' consists of an adjective-(or superlative)-plus-noun pair, yields a much larger set of candidate names, with the set of keywords used in the analysis generating over 127,000 possible available names, including (for example) 421 of the form thebest[noun].com.

Discussion

Once again, the analysis has shown that - despite the general lack of available, short .com domain names, and particularly those with SLDs consisting just of a single-word dictionary term - there are still significant numbers of brandable options available, for searchers with access to the appropriate tools, techniques and algorithms. It is also noteworthy that many of the examples available for registration at cost price (typically of the order of $10) are deemed to have significantly higher values, in many cases in excess of $1,000.

References

[1] https://circleid.com/posts/20240916-unregistered-gems-part-3-keeping-your-ize-on-the-prize

[2] https://www.appella.net/2017/01/how-creative-suffixes-help-brands-get-attention

[3] https://www.ceros.com/inspire/originals/ly-ify-startup-names/

[4] https://www.iamstobbs.com/opinion/a-new-tld-to-.ad-to-the-collection

[5] https://circleid.com/posts/20240911-further-explorations-in-brandable-domain-names-sensational-spellingz

[6] https://www.wordstream.com/popular-keywords

[7] https://gist.github.com/creikey/42d23d1eec6d764e8a1d9fe7e56915c6

[8] e.g. https://www.strunkmedia.com/complete-list-business-verticals-industries-sectors/

[9] https://www.reportingninja.com/blog/google-business-profile-categories

[10] https://docs.google.com/spreadsheets/d/1dz0rrQG0wq1zVPFVs_oX8zRWi2IEmNv7rz3m61hMQu8/edit

[11] https://www.stickyball.net/esl-grammar/parts-of-speech/100-common-adjectives.html

[12] https://www.linkedin.com/feed/update/urn:li:activity:7242224892815802369/

[13] https://typely.com/blogs/entry/6-how-to-form-superlative-adjectives-plus-100-common-superlatives-list/

This article was first published on 23 September 2024 at:

https://circleid.com/posts/unregistered-gems-part-4-other-brandable-domain-name-styles

Tuesday, 17 September 2024

Unregistered Gems Part 3: Keeping your -ize on the prize

The previous two articles in this series[1,2] have outlined techniques for 'mining' brandable domain names - that is, domain names of potential interest to entities looking to launch a new brand name and associated website - from the enormous dataset of unregistered names (determined via zone file analysis). The key element of the identification process is the implementation of filtering techniques to identify a (relatively) short-list of candidate names fulfilling certain criteria or comprising particular name patterns, which can then be manually reviewed for marketing and branding suitability. The previous articles have addressed two techniques which can be applied to this process; namely the use of phonotactic analysis to identify 'readable' (or wordlike-'sounding') names, and the use of algorithms to generate (alternatively spelled) variants of specific keyword- or brand terms of interest.

In this follow-up, I consider the use of combinations of these techniques to identify a special class of domain names; those where the name (strictly, the second-level name, or SLD - i.e. the part to the left of the dot) is a term ending with '-ize' (or variants). 'ize' names are popular in branding, and can convey a sense of modernity and dynamism[3]. There is also branding wisdom suggesting that names which function as a verb (or have a name structure which is amenable to this) can aid with memorability[4]. Even just amongst the set of companies offering branding and marketing-related services, we find a number of '-ize's (Localize (local-ize.ch), Merchize (merchize.com), Partnerize (partnerize.com), e-tailize (e-tailize.com), Visualize (visualize.design), Color-ize (color-ize.com), etc.).

Domain names of this type can take a number of forms, including actual dictionary terms, or 'sensationally' spelled variants - which can also apply to the suffix itself; the option to use 'ise', 'ize', 'yse' or 'yze' variants - all of which 'read' similarly - provides an additional degree of flexibility to the search process (and can also be a consideration when looking to address a target market - 'ize' spellings being more common in the US and 'ise' in the UK, for example). Other categories of brandable 'ize' names include neologisms, phrasal names (e.g. ending in 'wise' or 'size'), or other plays on words (e.g. where a word ending in '-ify' can be 'translated' into '-ifize', as a soundalike of '-ifies').

Overall, the total numbers of unregistered names are very large, demonstrating the importance of an effective filtering approach. For example, the analysis identified 65,864 unregistered domains of length 6 characters and ending with 'ize' (or 'ise', 'yse' or 'yze'), 1.8 million 7-character instances, 47.5 million 8-character instances, and 1.235 billion 9-character instances (outside the set of registered examples, which run alphabetically from aaacruise.com to zzsunrise.com).

The filtering process used to identify brandable examples can include combinations of any of the techniques discussed previously, or others - including (for example) searches for domains matching certain consonant (C) and vowel (V) patterns, with (for example) 'VCVC-ize' or 'CVCVC-ize' perhaps more frequently producing credible names than other combinations. In this analysis I consider the prevalence of exact matches to, or phonetic / sensationally-spelled variants of, dictionary '-ize' words, as an illustration of the types of attractive available domains which can be identified.

As with the previous articles, a group of the domains with greatest brandable potential was identified through manual review of the filtered short-list, and registered as a 'test-set', allowing an assessment of their estimated values based on an AI tool (Table 1).

Domain name
                                
Value
                                
  agonyze.com <$100
  avianize.com* $119
  civylize.com <$100
  demonyse.com $109
  dygitise.com $1,047
  eqalize.com $1,280
  fynalize.com <$100
  idolyse.com <$100
  immunyze.com† $1,702
  minimyse.com <$100
  utylise.com <$100
  utylize.com $1,344
  womanyze.com <$100

* Exact spelling match to the dictionary term
† Approved after a review process by the Atom.com marketplace as a premium domain with a value of $2,899

Table 1: 'Test set' of brandable 'ize' domains and their values as estimated by an AI-based tool

These findings provide a further illustration of how credible brandable domains can readily be identified from the pool of unregistered names using suitable filtering and analysis tools and techniques. It is also noteworthy that, even with names as short as eight characters, there are instances of unregistered dictionary terms to be found.

In practice, the production of a short-list must be guided by the would-be potential brand owner, based on their preferences on a potential name, its structure, and their area of business (and associated potential keywords), but these analyses have shown that bespoke filtering processes can relatively straightforwardly be applied to the universe of names - through the use of effective algorithms - to identify sets of candidate names meeting specific requirements.

References

[1] https://circleid.com/posts/20240903-unregistered-gems-identifying-brandable-domain-names-using-phonotactic-analysis

[2] https://circleid.com/posts/20240911-further-explorations-in-brandable-domain-names-sensational-spellingz

[3] https://www.appella.net/2017/01/how-creative-suffixes-help-brands-get-attention

[4] https://stickybranding.com/make-your-brand-name-function-as-a-verb/

This article was first published on 16 September 2024 at:

https://circleid.com/posts/20240916-unregistered-gems-part-3-keeping-your-ize-on-the-prize

Wednesday, 11 September 2024

Further explorations in brandable domain names: Sensational spellingz

Introduction

My previous article on brandable domain names[1] - that is, available (unregistered) domain names which may be appealing to any entity looking to identify a potential name for a new brand launch - focused on the use of phonotactic (i.e. 'readability') analysis techniques to identify candidate names. In this follow-up, I consider the specific case of 'wacky' alternative spellings of familiar terms - more properly termed 'sensational spelling'[2] - as potential candidates for brandable names. It is important to note that the focus in this article is on variants of generic, business-related keywords, rather than on brand terms; the latter category actually being the much more dubious - and potentially infringing - practice of typosquatting.

Sensational spelling - itself a sub-category of novel or invented terms more generally - has become increasingly popular in branding and marketing, particularly among newer or technology-focused brands[3] (with familiar examples including Flickr, Reddit, Digg, Fiverr and Tumblr, in addition to older, more established brands such as Krispy Kreme, Weetabix, Blu-ray, Froot Loops and Playskool). The growth in appeal of these types of names is part due to the increasing shortage of available domain names and the crowded landscape of pre-existing protected trademarks. Use of such terms can make it simpler to identify available domains for use, and can make it more straightforward to secure intellectual property protection, but this is by no means a universal rule; trademark law is a complex area involving significant subjectivity, and issues regarding potential brand confusion and unfair advantage are relevant.

Nevertheless, an appetite for sensational spellings can certainly make it more straightforward to identify candidate domain names which are available and may be of interest to a would-be brand owner. Although the domain name landscape is crowded, there remain large numbers of unregistered options (even amongst .com domains as short as 5 or 6 characters), depending on the acceptable permitted degree of variation away from the 'true' keyword of interest.

Methodology

In this study, I consider the prevalence of 'available' .com domain names (i.e. those absent from the .com zone file) whose second-level names (SLDs - i.e. the part of the domain name to the left of the dot) are sensationally spelled variants of any of a number of popular generic business-related keywords[4]. A set of 34 keywords is used in this analysis, covering a range of business areas, though in practice any given entity seeking an appropriate name would be likely to focus on keywords relating specifically to their business areas (or other favoured terms). The analysis considers only 5- and 6- character SLDs comprising variants of 4-, 5- or 6-character keywords.

A range of techniques are factored into the algorithm used to generate variant spellings, including the use of vowel removal (as per 'Flickr' and 'Tumblr'), character repeats, and the replacement of characters - or groups of characters - with others which are (for example) pronounced similarly. It is important to note that - as with any algorithm used to used to generate candidate brandable names - the output must be manually reviewed for suitability. This is particularly the case for sensational spellings, where some variants may not 'work' for the keyword in question. For example, a variant where a 'g' has been replaced by a 'j' will not be suitable if the initial 'g' was pronounced as a 'hard' consonant (as in 'tingle'). As referenced in the previous study, it is also necessary to verify that any identified domain name is actually available for registration, rather than simply being absent from the zone file (and unavailable) for some other reason.

Analysis

In total, 958 available domain names comprising variants of any of the 34 keywords in question were identified, with the top keywords (by number of identified variants) shown in Table 1.

Keyword
                                
No. variants
                                
  tech 86
  quote 75
  celeb 66
  cheap 60
  stock 53
  office 52
  income 49
  secure 39
  logic 35
  degree 31
  medic 31

Table 1: Top keywords by number of identified available 'sensationally spelled' variant domain names

It is worth noting that the exact numbers are somewhat arbitrary, being dependent on the number of substitutions deemed to be 'acceptable' for each of the characters or character-groups in question. Another point of significance is that (as mentioned previously) the sets of identified domains for any given keyword will vary markedly in 'quality' (i.e. the extent to which they comprise a convincing readable variant of the keyword in question). For example, the set of variants of 'logic' ranges from (at the subjectively 'better' end of the spectrum) logikq.com and logiqk.com, to loogec.com and loojic.com.

From the set of 958 candidate variant domains, a test group of 11 (identified through a manual analysis approach for filtering the results) were then registered in order to investigate their potential attractiveness for brandability and saleability. The assessed values of these domains, according to an AI-based domain valuation tool, is shown in Table 2. Six of the domains are valued at over $100, showing that this approach is capable of identifying credible brandable names.

Domain name
                                
Original keyword
                                
Value
                                
  inzurs.com   insure $125
  inzurz.com   insure <$100
  logikq.com   logic $1,445
  logiqk.com   logic <$100
  marxet.com   market $1,392
  medikq.com   medic <$100
  mediqk.com   medic $1,366
  sckure.com   secure <$100
  stoqck.com   stock (no value given)
  sztem.com   system $1,519
  zrver.com   server $113

Table 2: Monetary values of the set of 11 registered candidate keyword-variant domain names, according to an AI-based tool

Conclusion

The use of an algorithm to generate 'sensationally spelled' variants of keywords (or other preferred brand terms) of interest is another technique (alongside the use of phonotactic acceptability calculations, as described in the previous article) for efficiently identifying domain names which may be appealing from the point of view of potential brandability, from the enormous pool of unregistered candidate names. These techniques can be combined with others, such as filtering on the basis of favoured name prefixes or suffixes, to further focus on the names of greatest interest within the dataset. However, the sets of names thus identified will invariably need to be reviewed manually, in order to determine suitability from the point of view of readability, linguistic 'feel', and branding and marketing desirability.

Of course, simply the identification (and even registration) of a name provides no guarantee that it will be easily defensible from the point of view of intellectual property protection. It is generally necessary to conduct searches to verify any pre-existence of registered rights for a proposed mark (or similar variants); furthermore, trademark decisions are inherently subjective, and there is a requirement to navigate the landscape of (avoiding) brand confusion and unfair advantage, meaning collaboration with experts in the field of intangible asset management will usually be advisable.

References

[1] https://circleid.com/posts/20240903-unregistered-gems-identifying-brandable-domain-names-using-phonotactic-analysis

[2] https://en.wikipedia.org/wiki/Sensational_spelling

[3] https://www.quora.com/What-are-some-renowned-brand-names-that-are-misspellings

[4] https://www.wordstream.com/popular-keywords

This article was first published on 11 September 2024 at:

https://circleid.com/posts/20240911-further-explorations-in-brandable-domain-names-sensational-spellingz

Tuesday, 3 September 2024

Unregistered Gems: Identifying brandable domain names using phonotactic analysis

Introduction

Conventional wisdom within the domain-sales industry states that the stock of unregistered domain names is 'running out', with limited or no availability of short, desirable domain names across popular extensions (TLDs). This presents problems for would-be brand owners looking for a brand name (and accompanying suitable website presence) to utilise for newly-launched companies, producing a push towards the selection of longer, unusual or novel terms for new brand names and/or increased adoption of new TLDs or those which were historically less popular - with the alternative generally being a requirement to purchase a pre-existing name at a premium price.

The initial statement made above is true up to a point. For .com, for example - still the most popular and trusted domain-name extension by a significant margin - there are no available unregistered (Latin) alphabetic domain names of four characters in (second-level domain (SLD) name[1]) length or less. For longer domain names, the proportion of domain names which are not currently registered increases rapidly due to the exponential rise with domain-name length in the number of possible names (equal to 26n, where n is the length (in characters) of the domain name). For 5-character .com domains, there are around 9 million unregistered strings (out of a 'pool' of 12 million), and for 6 characters, there are 303 million unregistered combinations (out of a possible 309 million)[2,3]. However, it is generally accepted that the vast majority of dictionary words are genuinely already registered.

These determinations can easily be made through analysis of domain zone files, the data files maintained by registry organistions containing comprehensive lists of all registered domains across the TLD in question. It is important to note, however, that the absence of a candidate domain from the zone file does not necessarily mean that the domain is available to register; domains can be absent for the file for other reasons (such as having being put 'on hold' status, or having no associated nameservers) and may be reserved or otherwise unavailable, so a round of 'post-processing' checks is required in order to confirm that any specific domain missing from the zone file is actually registrable.

Methodology and analysis

The large set of unregistered domain names - even on .com and at lengths as low as five characters - does mean that there are still plenty of options available for the launches of new brands; the difficulty is in finding the appropriate candidate names.

The lack of availability of dictionary words as registrable domain names does mean that - aside from cases of acquisition of pre-registered domains - recent years have seen increased use of new styles of brand names, such as word combinations, neologisms containing familiar roots and fragments (so-called 'transmutations'), and 'wacky' alternative spellings of familiar words (often with one character replaced by another with a similar pronunciation, or with vowels removed)[4,5] - in addition to a wide range of abstract terms[6]. The current popularity of names of this type can be seen through the selections of names - billed as 'brandable' options - available for sale by domain brokers and on domain marketplaces (Figure 1).

Figure 1: Examples of domain names offered for sale on the Atom.com marketplace (as of 27-Aug-2024)

Also an attractive aspect of adopting an 'invented' term as a brand name - particularly if the domain has never been registered before[7] - is that there is a lower likelihood of existence of pre-existing intellectual property rights or name collisions or confusion, meaning it is likely to be more straightforward to protect and defend the new name.

One viewpoint is that there is nothing inherently 'special' about the names typically on offer for sale through such sources, apart from the fact that they have been identified as being available and have been deemed 'brandable'; potentially there may be (many?) other equally attractive options available within the unregistered 'pool', with the issue being the complexity of identifying them amongst the enormous numbers of other random character strings - i.e. how is it possible to filter down the set of candidate names into a more manageable number?

A key analysis technique involves the use of so-called phonotactics - essentially, a measure of the potential readability, or similarity to other existing words (or brand names!) present in the corpus of a language, of the candidate strings. (This analysis focuses on 'English-like' terms, but similar principles can also be applied to other languages.) In this study, the analysis is carried out just on the SLD string of each domain name. Of course, an objective determination of phonotactic 'acceptability' does not necessarily mean that a candidate domain name will be an attractive brandable option, so it is generally also necessary to conduct a subsequent manual review of the (much smaller) filtered set of names, using a more subjective assessment of branding potential (based on the intrinsic understanding and 'feel' of language and marketing that only a human reviewer can impart). Some of the basic ideas of brandability are well summarised by Nick Kolenda’s overview of the subject[8].

Phonotactics - basics

An example of a phonotactic calculator is that produced by UCI[9], though there are a number of other implementations of similar tools. The algorithm used in this analysis is the BLICK model (Hayes, 2012)[10] which, for an arbitrary string of phonemes (i.e. a candidate word), outputs a score providing a measure of the extent of phonotactic 'violation' - i.e. a lower score denoting a more credible potential name; in the words of the original study:

[The model] predicts that ket [K EH1 T] should a completely perfect word of English (penalty score zero), that doit [D OY1 T] should be a somewhat peculiar word of English (score 3.094), and that nguhyee [NG AH H Y IY0] should be a pretty horrible word of English (score 12.295).

The implementation utilised in this analysis requires that each string is first converted to its phonetic representation using ARPABET syntax, in which each phonetic element is represented as a series of Latin characters (and, in some cases, a trailing digit)[11,12].

Previous brandable domain sales

As a case study, it is informative to consider a set of previous domains (all .com) sold (or on sale) as brandable examples, for which sale prices are available from a range of sources. The relationship between the sale price and phototactic violation score for the 5-character SLD names is shown in Figure 2 (noting that, for some strings, the phonotactic calculation algorithm will fail, in which cases, the score is assigned a default value of -1).

Figure 2: Relationship between sale price and phonotactic violation score for 5-character brandable domain names sold (or on sale) through a range of sources

Within this specific dataset, there is no strong relationship between cost and phonotactic score (even if the results from Novanym.com, for which all domain sales were for a constant price, are excluded). However, it is significant that all domains in the set have relatively low scores (compared with the distribution within the 'universe' of unregistered names) and with the vast majority at scores of 6 or lower.

A deep-dive into the universe of unregistered 5- and 6- character .com domain names

Based on analysis of the .com zone file carried out in mid-August 2024, there are 9,284,133 of the set of possible alphabetic 5-character names, and 303,531,886 6-character names, absent from the zone file, and potentially available for registration.

It is informative to consider the distribution of phonotactic violation scores (for the SLDs) across an unfiltered set, to gain an indication of the variations across the dataset and how these reflect the 'readability' of the corresponding names. Because the calculation algorithm is relatively slow, this analysis considers only the 5-letter names beginning with 'a' and 'b' (one vowel and one consonant). This gives a dataset of 478,369 (candidate) domain names. Of these, 8,893 (1.9%) are assigned a score of zero, but with a wide range of scores observed, up to a maximum of just under 68 (Figure 3). The five SLD strings with the highest scores (i.e. the least credible brandable candidates) are awbzp (59.43), bctko (65.17), anwjf (65.94), apgdj (67.26) and bchji (67.92).

Figure 3: Distribution of phonotactic violation scores for all unregistered 5-character alphabetic domains with names beginning with 'a' or 'b'

The above analysis does suggest that the calculation of phonotactic violation score does provide one reasonable basis for filtering down large datasets of candidate domain names into smaller subsets (i.e. by selecting a score threshold and then retaining domains with SLD scores below that value), which can then be reviewed for suitable brandable candidate domain names. Realistically, this criterion will need to be combined with others in order to obtain datasets of manageable size (particularly given the fact that many readable / brandable names are assigned scores of up to 15 or greater), and to drill down into subsets that meet particular conditions (e.g. contain product-related keywords or 'fragments', or where some rough guidelines on the preferred brand name are available). Examples of other suitable possible filtering criteria might be that it may be preferable to exclude any names containing no vowels or with no more than two consecutive repeated characters. Of course, such criteria may not always be appropriate, depending on branding preferences, but are likely to be the sort of conditions that will 'fit well' with the use of phonotactic analysis (i.e. where brand names resembling classical readable words are being sought). This type of analysis will also likely not be appropriate for generating non-readable names (e.g. those intended to be used as acronyms / initialisations), but this is deemed to be a separate problem (since a brand owner seeking to use an acronym will likely already have an established (multi-word) name to which that acronym will apply).

Review

So, does this approach yield meaningful results? The short answer is that it does seem to do so. An initial set of searches within the sets of 312 million unregistered 5- and 6-character .com domain names, combined with test domain purchases, has allowed the identification of at least some domains which, when submitted to the Atom.com domain marketplace, have been deemed attractive enough from a brandability point of view that they have been assigned 'premium' status with suggested values in excess of $2,000:

  • axidy.com (phonotactic violation score: 0.90) - suggested price $2,299
  • gyble.com (0.90) - $2,299
  • kyppy.com (0.90) - $2,299
  • ebeya.com (no score assigned) - $2,399
  • byskit.com (0.12) - $2,499
  • fybric.com (0.49) - $2,499
  • qaxxy.com (0.90) - $2,499
  • duklet.com (0.93) - $2,599
  • tyckl.com (0.00) - $2,699
  • ozogy.com (no score assigned) - $2,799

This set of 10 premium domains is from an initial test-set of 48 candidates, all deemed to be attractive on the basis of a manual review of the filtered dataset, and submitted to Atom.com for consideration for sale suitability (i.e. a 'hit rate' of 21% - more than one in five). Many of the other identified domains are, however, also appealing from the point of view of potential brandability, with 25 of the group of 48 being assessed by an alternative AI-based domain valuation tool as being worth $100 or more.

Conclusions

The use of phonotactic analysis, combined with other appropriate filtering criteria and a subsequent process of manual review and assessment, does appear to provide a valid basis for identifying brandable domain names - i.e. candidates of potential interest for individuals seeking an attractive name for a new business - amongst the very large dataset of unregistered potential names; a dataset which would, by virtue of its size, not be reasonably manually reviewable without the application of these criteria. These ideas are key to the identification of attractive available names - the 'unregistered gems' within the extensive bedrock of domain-name noise - which has obvious applications in the provision of brand-name recommendations and branding consultancy.

Acknowledgements

Thanks must go to Sten Lillieström for his introduction to this subject, and his calm and measured enthusiasm in all subsequent discussions.

References

[1] i.e. the part of the domain name to the left of the dot 

[2] https://www.iamstobbs.com/availability-of-domains-ebook

[3] 'Patterns in Brand Monitoring' by D.N. Barnett, Chapter 9: 'Domain landscape analysis' [awaiting publication

[4] https://www.nextventure.com/blog/practice-and-circumstance-on-the-market-for-names

[5] https://www.nextventure.com/blog/did-the-investment-presage-what-became-of-the-domain

[6] https://www.atom.com/blog/business-brand-name-ideas-101/

[7] https://www.linkedin.com/posts/sten-lilliestr%C3%B6m-885247179_names-domain-name-activity-7227214694032261121-JQN5

[8] https://www.kolenda.io/pdf/brand-names.pdf

[9] https://phonotactics.socsci.uci.edu/

[10] https://linguistics.ucla.edu/people/hayes/BLICK/

[11] https://github.com/Kyubyong/g2p/

[12] https://github.com/Kyubyong/g2p/issues/29

This article was first published on 3 September 2024 at:

https://circleid.com/posts/20240903-unregistered-gems-identifying-brandable-domain-names-using-phonotactic-analysis

Unregistered Gems Part 6: Phonemizing strings to find brandable domains

Introduction The UnregisteredGems.com series of articles explores a range of techniques to filter and search through the universe of unregis...