Wednesday, 15 January 2025

Unregistered Gems Part 6: Phonemizing strings to find brandable domains

Introduction

The UnregisteredGems.com series of articles explores a range of techniques to filter and search through the universe of unregistered domain names, in order to find examples which may be compelling candidates for entities looking to select a new brand name (and its associated domain). The previous instalment of the series[1] looked at the categorisation of candidate names according to the phonetic characteristics of their constituent consonants, using a simple one-to-one mapping between each consonant and a corresponding phonetic group.

In this study, I explore the use of a more formal phonetic representation of each string, involving its conversion to its IPA (International Phonetic Alphabet) representation[2]. This has a number of advantages over the previous approach, including the ability to properly handle differences in pronunciation of particular characters according to their context, handling of character combinations, and the ability to generalise the approach to strings of arbitrary consonant/vowel patterns and length.

Framework

As in the previous study, the strings are classified according to the phonetic categories of their constituent consonants, but with all vowel sounds just combined into a single group. This approach follows from the assertion that the consonants comprise the core 'structure' of the word, and avoids having to handle the more complex nature of vowel sounds (such as the presence of vowel diphthongs, variations in length (i.e. 'long' vs 'short' sounds), and the impact of the accent of the speaker (noting that the IPA conversion tool used is based on American English)).

The consonant sounds are divided into the following groupings, again following the framework used in the previous study, and with the phoneme symbols taking their usual IPA meanings[3,4].

Top-level group
                                
Group
                
Type
                                                
Consonant
phonemes
                        
  1 (plosive) 1A   Bilabial plosive   b, p
  1 (plosive) 1B   Alveolar plosive   d, t, ɾ
  1 (plosive) 1C   Velar plosive   ɡ, k
  2 (nasal) 2A   Bilabial nasal   m
  2 (nasal) 2B   Alveolar nasal   n
  2 (nasal) 2C   Velar nasal   ŋ
  3 (fricative) 3A   Labiodental fricative   f, v
  3 (fricative) 3B   Dental fricative   θ, ð
  3 (fricative) 3C   Alveolar fricative   s, z
  3 (fricative) 3D   Postalveolar fricative   ʃ, ʒ
  3 (fricative) 3E   Glottal fricative   h
  4 (approximant) 4A   Labial-velar approximant   w
  4 (approximant) 4B   Retroflex approximant   ɹ, r[5]
  4 (approximant) 4C   Palatal approximant   j
  5 (lateral approximant) 5A   Alveolar lateral approximant   l
  6 (affricate)[6] 6A   Postalveolar affricate   ʧ, ʤ

Table 1: Groupings assigned to individual consonant phonemes as used in the analysis

Any string can then be represented as a 'code' (the 'word type'), comprising the top-level group numbers of the consonants (and with any vowel sounds, or sequences of consecutive vowel sounds, denoted simply as a 'V'), expressed in the order in which they appear in the string.

For example, therefore, the string 'rolex' is encoded in IPA representation as 'ɹoʊlɛks' which is assigned word type 4V5V13.

Analysis

By analogy with the previous study, it is informative to again consider the same set of 2,000 most popular 5-character (by second-level domain name, or SLD - i.e. the part of the domain name to the left of the dot) names offered for sale on the domain marketplace Atom.com[7] (by virtue of which inclusion they have independently been deemed to be attractive from a brandability point of view), to determine any patterns or common word types within this dataset.

There are actually 627 distinct word patterns represented in this dataset (noting that there are 7 distinct groups into which the phonemes can be assigned (cf. 6 in the previous study), and that there is here no upper limit to the total possible length of the word 'code' representation), of which the top ten are shown in Table 2.

Word type
                                
No. domains
                                
  3V13V 62
  3V3V 62
  1V13V 48
  1V3V 47
  3V1V 35
  4V3V 33
  4V13V 32
  3V35V 31
  3V23V 30
  1V1V 29

Table 2: Top ten word types represented in the dataset of 2,000 most popular 5-character (made-up, up to two syllable) names on the Atom.com domain marketplace

Accordingly, there are 62 of the 2,000 domains whose (SLD) names fit the (joint) most common word-type pattern (3V13V) represented amongst this set of popular domains, which are listed below.

Word type 3V13V:

  • vodzy         
  • vebsy         
  • hauxa         
  • fexie         
  • hixxi
  • xaxor
  • zetza
  • suxxo
  • xaxxy
  • huxxa
  • vudzi
  • sedza
  • hydso
  • vitvy
  • phexy
  • cipza
  • votvy
  • xuxxo
  • zuxxa
  • cexxi
  • zeexo
  • zogzy
  • zepvi
  • ciexa
  • soaxy
  • vapzy
  • vycci
  • fudfy
  • vybsy
  • veexy
  • foxxu
  • vodvi
  • fiexa
  • vuxxy
  • vauxa
  • fabvy
  • zotvo
  • cerxa
  • zatva
  • zepfy
  • vapzi
  • hoxor
  • serxa
  • huxey
  • vegvy
  • vuxoo
  • fotvi
  • vuxxi
  • xoxxy
  • cixxa
  • suxxa
  • vibsi
  • hooxo
  • fauxo
  • zopzy
  • zabvi
  • virxi
  • huxee
  • voixi
  • huxxo
  • zirxo
  • zopvi

Discussion

As discussed in the previous instalment, this type of analysis may allow steps towards the development of a set of 'guidelines' as to which types of word types (i.e. sound patterns) might constitute the most preferred names from a brandability point of view. If so, these ideas could be used as a basis for filtering large datasets to identify possible candidate names of interest. One downside to this approach is that, as with the use of phonotactic analysis[8], the framework presented here involves the conversion of each string to a phonetic representation, which is computationally relatively slow. However, unlike phonotactic analysis, this new methodology provides a basis for a more granular clustering of candidate names, and potentially (providing the preferred word types are correctly selected) may provide a more effective 'mapping' between candidate names and their potential desirability.

If (for example) we assume that word type 3V13V is a 'good' pattern for brandable names, it is informative to investigate its use as a filter. For illustration, we can consider the set of unregistered .com names of the form CVCCV ('C' = consonant, 'V'= vowel) from the original study in this series, using the subset beginning with the letter 's' (a 'group 3'-type sound) as an example. There are 6,044 such names. Of these, 567 (9.4%) are found to be of word type 3V13V[9], and it might be reasonable to assume that (at least some of) these may be candidates for brandability which are at least as credible as the names taken from Atom.com listed above. Some examples of the names in this new filtered dataset include sagsy, sedsi, sicsy, sodsy, sudci, suqsy, sybzi, sycci, sygzy, syksi and sytzo.

References

[1] https://circleid.com/posts/unregistered-gems-part-5-using-groupings-to-find-brandable-domains

[2] The conversion is carried out using the Python module Phonemizer, as was also used in a previous study on the analysis of strings for the purposes of mark similarity quantification: 

[3] https://home.cc.umanitoba.ca/~krussll/phonetics/articulation/describing-consonants.html

[4] https://www.dyslexia-reading-well.com/44-phonemes-in-english.html

[5] Technically, the 'r' phoneme represents a (voiced) alveolar trill, but is grouped together in this analysis with the 'ɹ' sound due to the similarity/ambiguity between the two. 

[6] The Phonemizer module actually outputs these symbols each as two distinct characters ('tʃ' and 'dʒ', respectively), so they are first converted to single characters ('ʧ' and 'ʤ') wherever they appear in the IPA representations, to ensure they are treated as single phonemes ('ch' and 'dg', respectively) in the subsequent analysis. 

[7] https://www.atom.com/premium-domains-for-sale/all/length/5%20Letters

[8] https://circleid.com/posts/20240903-unregistered-gems-identifying-brandable-domain-names-using-phonotactic-analysis

[9] Actually, this is the second most common word type in the dataset, after 3V11V (651 instances), though there are actually 94 distinct word types represented in the sVCCV dataset. 

This article was first published on 14 January 2025 at:

https://circleid.com/posts/unregistered-gems-part-6-phonemizing-strings-to-find-brandable-domains

Tuesday, 14 January 2025

It’s a dark whois world

Introduction

A recent study by Interisle[1,2] has highlighted the prevalence of a lack of identifying contact information in the registration records of gTLD (generic top-level domain) domain names, with the claim that almost 90% of records are devoid of such information[3]. This trend is a familiar one following the introduction of the General Data Protection Regulation (GDPR) in 2018, in response to which much of the available contact information was redacted, but is arguably just a continuation of a pattern which was anyway becoming more common; use of privacy and proxy services is attractive to many registrants desiring online anonymity, and can be of particular appeal to infringers.

The study by Interisle considers a set of 3,000 domain names and also includes a focus on attempting to identify contact details on any associated hosted websites. In this article, we consider an analysis of a broader dataset of gTLD names, but focusing just on the information in the whois records themselves (which are explicitly covered by an ICANN regulation requiring the provision of accurate contact information to the registrar[4] - even if the registrar then 'masks' this information publicly), with a view to assessing the extent and implications of 'dark' whois records within the domain landscape.

Methodology and overview

The analysis considers a sample set of 500 domain names[5] from each of the 100 largest gTLD zone files, giving a total dataset of 50,000 domains, and considers only those whois records which are available via an automated look up (focusing specifically on the registrant name / organisation and e-mail address fields as given in the record).

In the study, we look to determine the prevalence of each of a series of whois record 'categories' corresponding to the degree of privacy protection or redaction used, and mirroring the definitions used by ICANN[6]:

  • Use of a proxy service - this is where no explicit information to the 'real' registrant is given in the name or e-mail address field of the record. Proxy service providers use their own contact details in the whois record and, technically, are in each case the legal registered owner of the domain, acting as a licensor of the name to the end customer.
  • Use of a privacy service - in this case, the customer is the registered owner of the domain, and is featured in the registrant name field of the whois record, although other contact details may be absent (often in place of forwarding e-mail addresses supplied by the service provider).
  • Redaction - this definition is taken to be where the term "redacted" explicitly appears in the whois record in place of one of the other fields normally present. In this study, redacted records are subdivided into those where a specific identifiable registrant is named, and those where this is not the case. Note that this category includes cases where an explicit contact e-mail address may also be given (which, according to some definitions, might be considered to be 'open' records).
  • 'Open' - these are cases where an explicit owner name and contact e-mail address is given. It is worth noting that this is a relatively strict definition, and excludes cases where the e-mail address is that of the underlying registrar or other service provider (taken in this analysis to be privacy-protected records).

Why is this issue important? Fundamentally, the absence of personal identifying information in a domain whois record makes it more difficult for brand owners to launch enforcement actions against infringers - particularly where 'real-world' escalation routes may be required - and can therefore be amenable to a scenario which is advantageous for bad actors. Although in some cases it may be possible to serve a notice requesting that a registrar reveals the underlying contact information they hold (and where provably inaccurate information can be grounds for domain suspension), levels of compliance and documentary requirements by registrars can be highly variable.

Furthermore, a dark whois landscape makes it more difficult for brand protection initiatives to be able to prioritise and cluster together domain results based on shared characteristics, making the execution of efficient bulk takedowns a more complex prospect, and increasing the difficulty in demonstrating bad faith activity by serial infringers.

Findings

Of the 50,000 domains in the dataset, only 14,908 (29.8%) have whois records which are available via automated look-up (noting that 51 of the 100 gTLDs do not return any information in response to automated queries), though noting that this is the dataset on which the remainder of the analysis is based. 36 of the 100 gTLDs do return whois records for at least half of the domains queried.

Overall, only 110 of the domains in the dataset (0.74%) were classified as having 'open' whois records - an extremely small proportion, but perhaps unsurprising in view of the strict definition used, and potentially best viewed as a conservative estimate. These domains are spread across fifteen different TLDs: .africa (3 domains), .agency (1), .art (1) .best (4), .bond (3), .cam (1), .christmas (7), .com (14), .company (1), .fun (5), .icu (14), .net (5), .pics (33), .tech (9) and .website (9).

The full statistics are shown in Table 1.

Category
                                                                
No. domains
                                
%
                                
  Proxy 9,384 62.95 %
  Privacy 524 3.51 %
  Redaction 3,377 22.65 %
  Redaction (with named registrant) 1,513 10.15 %
  'Open' 110 0.74 %
  TOTAL 14,908 100.00 %

Table 1: Numbers of domains with each category of whois record

The prevalence of use of proxy services is striking - accounting for almost two-thirds of domains in the dataset - but also shows significant variability between TLDs. In total, the samples of domains from eight of the TLDs in the dataset showed an adoption rate of proxy services of greater than 80%: .today (94.72%; N = 417), .shop (94.71%; N = 170), .christmas (93.13%; N = 495), .one (86.84%; N = 38); .cam (85.13%; N = 417), .zone (84.96%; N = 419), .media (84.90%; N = 384), .art (81.25%; N = 208) (where N is the number of domains (out of 500) in each case for which a whois record was returned by the automated look-up) (see also Appendix A).

It is also informative to consider the most commonly-used proxy service providers, and contact e-mail addresses given in privacy-protected records (Tables 2 and 3).

Registrant name
                                                                                                
No. domains
                                
%
                                
  Domains By Proxy, LLC 2,788 29.71 %
  Privacy service provided by Withheld for Privacy ehf 2,374 25.30 %
  None 1,066 11.36 %
  Super Privacy Service LTD c/o Dynadot 968 10.32 %
  Private by Design, LLC 360 3.84 %
  Whois Privacy Protection Service, Inc. 285 3.04 %
  Privacy Protect, LLC (PrivacyProtect.org) 241 2.57 %
  Contact Privacy Inc. Customer [] 214 2.28 %
  PrivacyGuardian.org llc 194 2.07 %
  See PrivacyGuardian.org 180 1.92 %

Table 2: Most common 'registrant organisation' fields given in domains using proxy services

E-mail address
                                                                
No. domains
                                
%
                                
  domainabuse@service.aliyun.com 188 30.37 %
  abuse@name.com 59 9.53 %
  abuse@reg.ru 41 6.62 %
  abuse@dns.business 32 5.17 %
  abuse@domains.co.za 31 5.01 %
  domainabuse@netim.net 20 3.23 %
  whois@domain-mgmt.net 10 1.62 %
  abuse@key-systems.net 10 1.62 %
  abuse@59.cn 10 1.62 %
  abuse@wdomain.com 10 1.62 %

Table 3: Most common contact e-mail addresses[7] given in privacy-protected records

Discussion

The paucity of 'real-world' contact details given in domain whois records is, in part, a construct of an environment where the appeal of anonymity is great, and is generating an online ecosystem which is advantageous for infringers and can be increasingly problematic for brand owners. This does not, of course, mean that nothing can be done from an enforcement point of view - requests for unmasking of contact details held by registrars can be successful in many cases where proof of wrongdoing is available. Even in the absence of registrant contact details, there is a range of enforcement approaches - such as hosting provider and registrar level notices - which are available. At the other end of the spectrum, for the highest priority infringements, a full formal domain dispute procedure can also serve as a means for obtaining registrant contact details.

In many cases, it may also be possible to build a picture of an infringer's activity by using a range of online and offline open-source intelligence (OSINT) investigation approaches, often using data-points taken from the website content itself, or information taken from historical whois databases, as a start point.

The introduction of schemes such as the Registration Data Request Service (RDRS) by ICANN, offering a simplified and standardised process for requesting registrant information[8], may also be a step in the right direction. It is also worth noting that the whois protocol itself, lacking many up-to-date technical attributes, is scheduled to be phased out in 2025 in favour of the more standardised Registration Data Access Protocol (RDAP), which has an improved underlying technology.

Going forward, it may transpire that the balance between demands for privacy and online protection forces a push back towards the previous environment of requiring a greater degree of accountability for website owners, and forcing a move towards more comprehensive whois databases. Adoption of mandates such as the Network and Information Security (NIS2) Directive, requiring registries and registrars to collect and provide free access to detailed ('thick' whois) information[9], may be part of this picture.

Appendix A: Numbers of domains with each category of whois record, by TLD

(N = number of domains for which a whois record was returned by the automated look-up)

TLD
                                
Proxy
                          
Privacy
                          
Redaction
                          
Redaction
(with named
registrant)
                          
Open
                          
N
                          
  pics 77.20 % 2.80 % 12.00 % 1.40 % 6.60 % 500
  christmas 93.13 % 1.62 % 3.43 % 0.40 % 1.41 % 495
  xyz 63.41 % 6.91 % 29.67 % 0.00 % 0.00 % 492
  africa 53.66 % 28.46 % 14.02 % 3.25 % 0.61 % 492
  com 70.19 % 5.59 % 19.67 % 1.66 % 2.90 % 483
  icu 27.67 % 19.92 % 49.48 % 0.00 % 2.94 % 477
  asia 42.00 % 0.00 % 39.78 % 18.22 % 0.00 % 450
  fun 55.01 % 12.25 % 30.29 % 1.34 % 1.11 % 449
  bond 42.76 % 2.45 % 43.88 % 10.24 % 0.67 % 449
  zone 84.96 % 0.00 % 6.68 % 8.35 % 0.00 % 419
  today 94.72 % 0.00 % 3.12 % 2.16 % 0.00 % 417
  cam 85.13 % 1.68 % 10.79 % 2.16 % 0.24 % 417
  best 48.19 % 1.69 % 46.51 % 2.65 % 0.96 % 415
  photography 58.72 % 0.00 % 30.47 % 10.81 % 0.00 % 407
  services 72.87 % 0.00 % 12.14 % 14.99 % 0.00 % 387
  solutions 77.72 % 0.00 % 11.40 % 10.88 % 0.00 % 386
  website 69.69 % 6.48 % 19.95 % 1.55 % 2.33 % 386
  media 84.90 % 0.26 % 9.11 % 5.73 % 0.00 % 384
  rocks 51.77 % 0.00 % 33.79 % 14.44 % 0.00 % 367
  academy 60.38 % 0.00 % 21.86 % 17.76 % 0.00 % 366
  global 60.11 % 0.27 % 19.67 % 19.95 % 0.00 % 366
  net 61.62 % 4.76 % 30.53 % 1.68 % 1.40 % 357
  link 71.55 % 0.00 % 20.85 % 7.61 % 0.00 % 355
  systems 51.14 % 0.28 % 23.58 % 25.00 % 0.00 % 352
  social 61.78 % 0.00 % 25.00 % 13.22 % 0.00 % 348
  care 54.33 % 0.30 % 25.37 % 20.00 % 0.00 % 335
  rest 79.39 % 0.00 % 19.39 % 1.21 % 0.00 % 330
  consulting 43.96 % 0.31 % 28.79 % 26.93 % 0.00 % 323
  llc 67.30 % 0.00 % 12.26 % 20.44 % 0.00 % 318
  digital 64.08 % 0.32 % 23.62 % 11.97 % 0.00 % 309
  wtf 70.82 % 0.00 % 18.36 % 10.82 % 0.00 % 305
  company 45.92 % 1.02 % 23.81 % 28.91 % 0.34 % 294
  games 55.48 % 0.34 % 29.45 % 14.73 % 0.00 % 292
  info 59.44 % 1.05 % 21.33 % 18.18 % 0.00 % 286
  agency 66.90 % 1.76 % 19.01 % 11.97 % 0.35 % 284
  email 38.85 % 0.00 % 30.58 % 30.58 % 0.00 % 278
  tech 52.99 % 21.37 % 19.23 % 2.56 % 3.85 % 234
  art 81.25 % 7.69 % 10.10 % 0.48 % 0.48 % 208
  shop 94.71 % 0.00 % 5.29 % 0.00 % 0.00 % 170
  org 37.95 % 0.00 % 30.12 % 31.93 % 0.00 % 166
  cloud 21.85 % 0.00 % 40.34 % 37.82 % 0.00 % 119
  wiki 69.01 % 0.00 % 8.45 % 22.54 % 0.00 % 71
  ink 22.58 % 0.00 % 27.42 % 50.00 % 0.00 % 62
  amsterdam 33.33 % 0.00 % 54.17 % 12.50 % 0.00 % 48
  one 86.84 % 0.00 % 13.16 % 0.00 % 0.00 % 38
  top 29.41 % 0.00 % 70.59 % 0.00 % 0.00 % 17
  app 50.00 % 0.00 % 0.00 % 50.00 % 0.00 % 2
  tel 0.00 % 0.00 % 50.00 % 50.00 % 0.00 % 2
  page 0.00 % 0.00 % 100.00 % 0.00 % 0.00 % 1
  autos - - - - - 0
  bayern - - - - - 0
  bet - - - - - 0
  bio - - - - - 0
  biz - - - - - 0
  blog - - - - - 0
  business - - - - - 0
  buzz - - - - - 0
  cfd - - - - - 0
  click - - - - - 0
  club - - - - - 0
  cyou - - - - - 0
  design - - - - - 0
  dev - - - - - 0
  eus - - - - - 0
  family - - - - - 0
  fyi - - - - - 0
  group - - - - - 0
  homes - - - - - 0
  ing - - - - - 0
  lat - - - - - 0
  life - - - - - 0
  live - - - - - 0
  lol - - - - - 0
  love - - - - - 0
  ltd - - - - - 0
  mobi - - - - - 0
  mom - - - - - 0
  name - - - - - 0
  network - - - - - 0
  news - - - - - 0
  nrw - - - - - 0
  online - - - - - 0
  ovh - - - - - 0
  pro - - - - - 0
  realtor - - - - - 0
  sbs - - - - - 0
  site - - - - - 0
  skin - - - - - 0
  space - - - - - 0
  store - - - - - 0
  studio - - - - - 0
  swiss - - - - - 0
  team - - - - - 0
  tokyo - - - - - 0
  vip - - - - - 0
  wang - - - - - 0
  win - - - - - 0
  work - - - - - 0
  world - - - - - 0
  zip - - - - - 0

References

[1] https://dnib.com/articles/interisle-report-examines-domain-name-contact-data-availability

[2] https://circleid.com/posts/new-data-on-domain-name-contact-availability-and-privacy

[3] Strictly, the study relates to the Registration Data Directory Services (RDDS) system(s) offered by registries and registrars for providing access to registration data, of which the familiar whois service is a subset - see https://www.icann.org/resources/pages/whois-rdds-2023-11-02-en

[4] https://www.icann.org/resources/pages/wdrp-2012-02-25-en

[5] The sample comprises every 25th domain in the order in which they appear in the zone file (generally alphabetical), until 500 have been extracted - this value was selected as all 100 of the zone files analysed contain at least 12,500 domain names

[6] https://www.icann.org/resources/pages/pp-services-2017-08-31-en

[7] Note that his may actually be the abuse contact e-mail address for the registrar; this may be the only explicit e-mail address given in the whois record in many cases.

[8] https://www.linkedin.com/posts/stobbs_rdrs-activity-7212106221485531136-Rr7B

[9] https://www.uschamber.com/technology/domain-name-data-why-its-disappearing-and-why-you-should-care

This article was first published on 14 January 2025 at:

https://www.iamstobbs.com/opinion/its-a-dark-whois-world

Thursday, 9 January 2025

Christmas and New Year brand protection trends - Part 2: New-year domain names

Introduction

Following on from our previous analysis[1] on festive trends in infringement activity, this article looks at the registration of domain names relating to the new year specifically, i.e. those gTLD domain names beginning or ending with '2025' (further to a similar study[2,3] carried out for New Year 2023).

The new year can be a popular opportunity to launch new products and services but, like any high-profile event, can be exploited by fraudsters and infringers looking to take advantage of the increased levels of interest and search-engine traffic. Accordingly, whilst many legitimate businesses may make use of '2025' domains, they equally can be utilised by bad actors for a range of infringing activities.

Analysis

As of the date of analysis (18-Dec-2024), almost 34,000 '2025'-specific domain names were identified. As with the previous study looking at domain names relating to delivery service brands, the current dataset is dominated by registrations having taken place in the last year, though starting from a higher relative baseline of pre-existing registrations (with the oldest domain in the dataset, 2025.com, having been registered as far back as 23-Aug-1998) (Figure 1).

Figure 1: Growth over time (2020 - 2024) in the numbers of domains in the current dataset

Approximately 21,000 of the domains (62% of the total) resolve to some sort of live website.

Within the dataset, we focus on those domains deemed to be of greatest potential interest, by virtue of the additional inclusion in the domain names of any of the following categories of keywords:

  • The names of any of the 20 most valuable global brands, according to the 2024 study by Kantar[4] (34 domains in total, excluding 'false postives' where the brand name appears as a sub-string of an unrelated term)
  • The names of other high-profile brands - e.g. 'iphone' (4 domains) or 'gpt' (7 domains)
  • The terms 'bitcoin' or 'crypto' (54 domains in total)
  • High-threat terms potentially related to phishing - e.g. 'login' (2 domains)
  • High-threat terms potentially related to malware distribution – e.g. 'download' (4 domains) or ' updat*' (13 domains)
  • The terms 'shop' or 'store' (86 domains in total, excluding those where the reference appears in the TLD domain name extension)

It is noteworthy that the number of examples of domains making explicit reference to large brands in the domain name itself is relatively small, perhaps a reflection of the knowledge that many of these organisations will be proactively monitoring for these types of registrations.

However, within this focused dataset, a range of types of site content were identified. Many of the domains appear to have been registered for monetisation purposes - e.g. pages offering the domain name for sale (i.e. potential cybersquatting, in cases where a brand name is referenced), or sites featuring pay-per-click (PPC) links, but several examples of live sites featuring infringing or other content of concern were identified (Figure 2).

Figure 2: Examples of websites of potential concern (SLD[5] name given in each case) – top to bottom:

  • Potential phishing - chatgpt2025
  • Potential cryptocurrency scam site - bitcoin2025
  • Sale of potential counterfeits:
    • Sale of products using luxury brand names - getithk2025, getithkshop2025
    • Site offering the sale of 'customised' products - nflshop2025
  • Potential trademark infringement / misdirection:
    • Third-party site displaying marketplace brand logo - 2025tk-shop
    • Re-direction to litigation website - visareferral2025, visasettlement2025, visalawsuit2025 (×2), visalitigation2025, visapartner2025
  • Gambling advertisement site - google2025

Additional patterns of activity are also apparent from the wider dataset, such as the registration of groups of domains whose names appear to be intended to denote specific dates, or other numerical patterns (e.g. 371 .com registrations where the SLD is an eight-digit string which could represent a date in UK- or US- format, and a batch of 63 names of the form NNN2025.xyz, all by the same Chinese registrant). The vast majority are inactive as of the time of analysis but, given the nature of the domain names, may have been registered with the intention of activation at a point in the forthcoming year. Furthermore, many appear to constitute one or more groups of coordinated registrations, showing elements (such as registration dates and use of particular registrars) in common.

Discussion

As in our previous article, the observations highlight how infringements can spike in response to real-world events of interest, such as the festive period, when online search activity and brand interest is heightened. At these times, brand owners who may be subject to related attacks are advised to be particularly mindful of the online risks, and to ramp up their attention to initiatives for the detection of harmful content and enforcement against the material in question.

References

[1] https://www.iamstobbs.com/opinion/christmas-and-new-year-brand-protection-trends-delivery-service-scams

[2] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 8: 'Trends in infringement activity'

[3] https://www.linkedin.com/pulse/four-new-case-studies-domain-registration-activity-spikes-barnett/

[4] https://www.kantar.com/campaigns/brandz/global

[5] SLD is the second-level domain, i.e. the part of the domain name to the left of the dot

This article was first published on 9 January 2025 at:

https://www.iamstobbs.com/opinion/christmas-and-new-year-brand-protection-trends-new-year-domain-names

Unregistered Gems Part 6: Phonemizing strings to find brandable domains

Introduction The UnregisteredGems.com series of articles explores a range of techniques to filter and search through the universe of unregis...