Thursday, 30 January 2025

January scams surrounding the fall and rise of TikTok and Trump

Introduction

January 2025 has seen two recent news stories of particular significance from a brand protection and online security point of view. The first of these surrounded the temporary ban in the US of Chinese-owned social media platform TikTok, on security grounds[1]. The second story also concerns Trump who, in the days prior to his inauguration, launched a meme-inspired official cryptocurrency (Official Trump, or $TRUMP)[2], the value of which had climbed from its initial price of $0.18 to $72 by 7am EST on the following Sunday, making the President an estimated $50 billion[3,4].

Many previous analyses have noted the links between real-world events and subsequent spikes in associated infringing activity, as a means of taking advantage of increased levels of public interest and search volumes. These two new stories - particularly in view of the amounts of revenue with which the brands in question are associated - are likely to be no exception.

Infringement landscape

There are a few obvious possible infringement 'hooks' in these cases. For TikTok, many observers[5,6] predicted a rise in scams of a range of types, such as those relating to purported methods for circumventing the ban (such as fraudulent VPNs) or offering fake versions of the app. In addition, with the push of would-be former TikTok customers towards the alternative platform RedNote (also known as LittleRedBook or Xiaohongshu)[7], this brand may have been, and still be, subject to similar issues. Additionally, it is likely to also subject to an increase in infringements targeting other brands on the platform itself, in view of its increasing popularity[8]. The Trump coin is also likely to be targeted by fake versions and a range of other scams typically associated with cryptocurrencies, a common forum in which Web2/Web3 'crossover' content is manifested[9].

In this article, we consider the landscape of newly-registered ('Web2') domains relating to the TikTok/RedNote and Trump stories, as a proxy for the overall infringement landscape, and offering a dataset which is easily searched and analysed.

The datasets make use of (gTLD) domain-name zone-file data (as of 20-Jan-2025), considering the following searches for 'high-risk' domains:

  1. Domains containing 'tiktok' and any of the 'high-risk' keywords: 'vpn', 'download', 'login' or 'access' (216 domains)
  2. Domains containing 'rednote' at the start, or 'rednote' plus any of the 'high-risk' keywords: 'vpn', 'download', 'login' or 'access' (514 domains)
  3. Domains containing 'trump' and any of the 'high-risk' keywords: 'coin', 'meme', 'crypto', 'bitcoin' or 'fight'[10] (1,703 domains)

Looking at the dates of registrations of the domains (where available via automated look-up), there is a striking spike of activity around the dates of emergence of the associated news stories (Figure 1), which is significantly disproportionate to the pre-existing 'background' levels of activity (average daily number of equivalent registrations in Q4 2024 is 0.196 for 'tiktok', 0.054 for 'rednote' and 1.620 for 'trump'). The numbers of registrations for 'tiktok' are surprisingly somewhat low, but it may be that some of the domains registered in the few days prior to the analysis are not yet reflected in the zone file data.

Figure 1: Daily numbers of registrations of 'high-risk' 'tiktok', 'rednote' and 'trump' domains, since 01-Jan-2025

Nevertheless, it is clear that a range of infringements of various types are already in place. Of the 'high-risk' domains, 1,064 produce some sort of website response (97 for 'tiktok', 187 for 'rednote', 780 for 'trump'), and Figures 2 and 3 show some examples of live sites of potential concern - all registered since the start of January.

Figure 2: Examples of websites of concern associated with 'high-risk' 'tiktok' or 'rednote' domains registered since 01-Jan-2025 (SLDs[11] shown in each case) (top to bottom):

  • Potential phishing (rednote)
  • Potentially non-legitimate / malicious VPN downloads (freevpnfortiktok)
  • Potentially non-legitimate / malicious app downloads (rednoteapk, rednoteapp, rednote)
  • Sites purporting to offer other associated services - file back-ups (downloadtiktoks), sale of followers (rednotefollower)

Figure 3: Examples of websites of concern associated with 'high-risk' 'trump' domains registered since 01-Jan-2025 (SLDs shown in each case) (top to bottom):

  • Purported sale or distribution of Official Trump cryptocurrency (gettingtrumpsmemes, firsttrumpmemecoin, officialtrumpmeme, buy-trump-coin)
  • Use of Trump name in unauthorised / third-party cryptocurrency (aitrumpcoin, pepetrumpcoin, etrumpcoin, babytrumpmemes)

Discussion and Conclusion

It is evident that these two prominent stories have - predictably - triggered a spike in infringements, with the risks in these cases taking a number of forms. These include potential phishing, distribution of potentially malicious content, fraud, and unauthorised brand use and claimed affiliation.

As ever, the conclusions to be drawn from these observations are clear. At times of increased online interest and high-profile news stories, consumers are advised to remain vigilant and be aware of the scope for potential scams. Brand owners should also take extra care to proactively monitor for (and take enforcement activity against) infringements which may affect them and their customer base.

References

[1] https://www.bbc.co.uk/news/articles/cjde3p0rnjgo

[2] https://gettrumpmemes.com/

[3] https://www.axios.com/2025/01/18/trump-meme-coin-25-billion

[4] https://eu.usatoday.com/story/money/2025/01/18/trump-meme-coin-price-crypto/77802704007/

[5] https://www.linkedin.com/feed/update/urn:li:activity:7286778120236355584/

[6] https://www.linkedin.com/feed/update/urn:li:activity:7286760863796011008/

[7] https://www.bbc.co.uk/news/articles/c2475l7zpqyo

[8] https://www.worldtrademarkreview.com/article/rednote-growing-tiktok-rival-rife-brand-impersonation-rights-holders-warned

[9] https://www.iamstobbs.com/opinion/web2/web3-crossover-brand-related-crypto-infringements

[10] These keywords were selected on the basis of their association with the Trump 'Fight, Fight, Fight' meme, by which the new coin is inspired; indeed, the currency is partly held by a Trump-owned company named 'Fight Fight Fight LLC'

[11] Second-level domain names - i.e. the part of the domain name to the left of the dot

This article was first published on 30 January 2025 at:

https://www.iamstobbs.com/opinion/january-scams-surrounding-the-fall-and-rise-of-tiktok-and-trump

Tuesday, 28 January 2025

Notorious: a B.I.G. set of markets to keep an eye on for counterfeiting and piracy

Introduction

This month has seen the publication of the 2024 version of the Review of Notorious Markets for Counterfeiting and Piracy (the 'Notorious Markets List' (NML)), by the Office of the United States Trade Representative (USTR). The annual review presents a summary of key online and real-world markets found to be associated with brand infringements, specifically counterfeiting (trademark infringement) and piracy (copyright infringement)[1], and thereby highlighting key platforms and areas of focus for brand protection programmes depending on the type of brand owner and the kinds of infringements to which they are subjected. This year's report[2] also includes a specific focus on illicit online pharmacies, reflecting the significant risks posed by counterfeit medicines.

Overview

One of the key components of the report is the list of online platforms which have been repeatedly found to facilitate infringing activity. The sites in question fall into a number of key categories, which are outlined below (with the individual sites assigned into 'best-fit' categories). In cases where the platforms have a focus in a specific country, this is also referenced below, using its country code in square brackets.

  • e-commerce platforms
    • The platforms highlighted in the report are those which contribute specifically to the trade in counterfeit or otherwise infringing goods, through a lack of adequate mediating policies, monitoring and tools.
    • Specific platforms:
      • General marketplace and shopping platforms - Bukalapak (bukalapak.com) [ID], DHgate (dhgate.com) [CN], Douyin Shangcheng (Douyin Mall) [CN], IndiaMART [IN], Pinduoduo (pinduoduo.com) [CN], Shopee (shopee.com) [SG], Taobao (taobao.com) [CN]
      • Classified advertisement site - Avito (avito.ru) [RU]
  • 'Bulletproof' hosting providers
    • These are Internet service providers offering web-hosting services for infringing sites, and which typically advertise their non-compliance to enforcement notices as a business model.
    • Specific providers: Amarutu, DDoS-Guard (ddos-guard.net) [RU], FlokiNET [IS / Europe], Squitter (squitter.eu) [possibly NL], Virtual Systems (vsys.host) [UA]
  • Cyberlockers
    • These act as hosting and content storage sites for copyrighted digital content, and often facilitate file-sharing through linking and streaming sites. In many cases, they are incentivised to share popular content to drive web traffic, by virtue of their reliance on advertising revenue, and may also share revenue with contributors of popular material.
    • Specific sites: 1fichier (1fichier.com) [FR], Baidu Wangpan [CN], Krakenfiles (krakenfiles.com), Libgen (libgen.rs, libgen.is, libgen.li, libgen.st, library.lol, libgen.rocks, libgen.gs, annas-archive.org, annas-archive.gs) [RU], Rapidgator (rapidgator.net, rg.to) [RU], Sci-Hub (sci-hub.se, sci-hub.ru, sci-hub.st, annas-archive.org, annas-archive.gs) [RU], Streamtape (streamtape.com) [FR]
  • Torrent websites
    • These are platforms providing access to copyrighted content which is available for download via the BitTorrent file-sharing protocol.
    • Specific sites: 1337x (1337x.to, 1337x.tw), Rutracker (rutracker.org) [RU], ThePirateBay (thepiratebay.org), Torrent Galaxy (torrentgalaxy.to, torrentglaxy.mx [sic], torrentgalaxy.buzz, tgx.rs, tgx.sb) (BitTorrent streaming), YTS / YIFY (yts.mx) [BG]
  • General piracy sites
    • These offer the sharing of copyrighted content in other general senses, relying on (for example) associated cyberlockers for content hosting.
    • Specific sites:
      • Streaming services - GenIPTV (genip.tv), MagisTV (magistv.net, oficialmagistv.com, magistv.digital, magistv.la, magisla.com, magistvoficial.com, magistv-venezuela.com) [Latin America], Vegamovies (vegamovies.in, vegamovies.boo, vegamovies.ren) [IN]
      • 'Stream-ripping' sites (circumventing the content-protection measures of other content providers) - SaveFrom (savefrom.net, savef.net, savefrom.live, savefrom.app, save-from.net, savefrom.best, save-from.biz), Y2mate (y2mate.com, yt1s.com)
      • 'Repacking' site (providing access to compressed versions of digital files) - FitGirl-Repacks (fitgirl-repacks.site) (video games)
      • General piracy - Cuevana (cuevana.biz, cuevana3.eu, cuevana3.ch, cuevana.pro, cuevana.si, cuevana4.me, cuevana3cc.me) [Latin America], HiAnime (hianime.to), Nsw2u (nsw2u.com, game-2u.com, ps4pkg.com, bigngame.com), Unknowncheats (unknowncheats.me) (offers download of game cheat codes, which may include copyrighted source code), VKontakte (VK) (vk.com) (social media) [RU], WHMCS Smarters (whmcssmarters.com, iptvsmarters.com) [IN]
  • Pirate content management systems
    • These provide libraries of infringing content, obtained through crawling and scraping of third-party data sources.
    • Specific site: 2embed (2embed.cc)

It is worth noting that many of the providers referenced in the report are geared towards digital piracy, which remains a significant issue generally for rights owners. A 2024 report by the EUIPO estimates that piracy accounts overall for about ten Internet accesses per Internet user per month on average[3], and MUSO provides an estimate of around 229 billion visits to piracy sites in 2023[4], affecting predominantly the TV (104 B), publishing (64 B), film (30 B), music (17 B) and software (15 B) industries.

However, counterfeiting also remains a major problem, utilising both online and physical channels. The World Customs Organisation's Illicit Trade Report 2023 references 98 million items comprising intellectual property violations intercepted through over 48,000 seizures (excluding US data), dominated by accessories, clothing and footwear entering into the US, Europe, Chile and Mexico[5].

Furthermore, the Notorious Markets report includes information on key geographic markets of concern, where the trade in counterfeits may be addressable through effective on-the-ground or customs initiatives, border controls, and/or appropriate legislation. Specific reference is given to locations in South America (Argentina, Brazil, Colombia, Paraguay, Peru) and the Far East (Cambodia, China, Indonesia, Malaysia, Philippines, Thailand, Vietnam), plus Canada, India, Kyrgyz Republic, Mexico, Russia, Türkiye (Turkey) and UAE.

Discussion and Conclusion

How representative is the information given in the NML? It is noteworthy that social-media platforms and many US-based marketplaces are essentially unrepresented in the report (apart from the inclusion of VK), despite the significant levels of activity observed across these channels - particularly on the various Meta platforms (Facebook, Instagram, WhatsApp, Threads), a concern echoed by other organisations such as the American Apparel & Footwear Association[6]. It is also significant that WeChat has been removed from the list (in favour of Douyin Mall, part of the ByteDance group) since the previous edition, in part due to the introduction of tools to deal with infringements. Taobao's continued appearance in the list remains (somewhat) controversial, particularly in view of the platform's increased engagement with rights holders and government, but many stakeholders continue to take the view that improvements to the site's process for reporting infringements are still required[7]. There is also a growth in commentary (particularly in China) that some elements of the list (notably the large numbers of inclusions of Chinese platforms) may be politically or commercially motivated[8,9], following similar comments the previous year[10].

In addition, commentary from the US Intellectual Property Owners Association (IPO)[11] stresses the continuing role of online marketplaces as a key channel type of interest, and mentions a number of additional platforms excluded from the list, but which continue to be of concern (AliExpress [CN], Facebook Marketplace, Noon [Middle East], Temu [CN] and - again - WeChat [CN]). There were also a number of other significant nominations for the NML (such as Shein and TikTok Shop) which failed to make the final edition[12]. It is also worth noting that the EU Intellectual Property Office (EUIPO) provides resources relating to the IP protection programmes offered by a number of marketplaces with which they collaborate[13], and none of these (with the exception of the Alibaba Group, which includes Taobao) are included in the NML; this is perhaps an indication of the power of the provision of rights protection initiatives by marketplaces.

Overall, it would be beneficial to see a greater degree of consensus on the marketplaces and other platforms - and the associated geographical focuses - which should be included in the NML going forward. This would help inform relevant policies, and give brand owners a clearer view of where the main areas of concern actually are. It is worth noting that many of the platforms referenced will need to comply with new rules covered by the Digital Services Act (DSA)[14]; we are already starting to see the commencement of proceedings by the EU Commission against platforms which are key focuses of infringing activity, such as Temu[15,16] and Shein[17,18], both of which have been designated as Very Large Online Platforms under the DSA framework.

References

[1] https://ustr.gov/about-us/policy-offices/press-office/press-releases/2025/january/ustr-releases-2024-review-notorious-markets-counterfeiting-and-piracy

[2] https://ustr.gov/sites/default/files/2024%20Review%20of%20Notorious%20Markets%20of%20Counterfeiting%20and%20Piracy%20(final).pdf

[3] https://www.euipo.europa.eu/en/publications/online-copyright-infringement-in-the-european-union-films-music-publications-software-and-tv-2017-2023

[4] https://www.muso.com/piracy-by-industry-report-2023

[5] https://www.wcoomd.org/-/media/wco/public/global/pdf/topics/enforcement-and-compliance/activities-and-programmes/illicit-trade-report/itr_2023_en.pdf

[6] https://www.fashiondive.com/news/meta-facebook-instagram-notorious-markets-counterfeit-list/729053/

[7] https://www.worldtrademarkreview.com/article/notorious-markets-list-2024-wechat-removed-and-douyin-mall-added-social-media-platforms-once-again-ignored

[8] https://www.globaltimes.cn/page/202501/1326622.shtml

[9] https://thebambooworks.com/douyin-in-wechat-out-of-latest-u-s-notorious-markets-piracy-list/

[10] https://www.globaltimes.cn/page/202401/1306451.shtml

[11] https://ipo.org/wp-content/uploads/2024/10/2024-IPO-Notorious-Markets-Comments.pdf

[12] https://www.worldtrademarkreview.com/article/associations-reiterate-calls-include-us-social-media-platforms-in-notorious-markets-list

[13] https://www.euipo.europa.eu/en/observatory/enforcement/tools/protecting-ip-rights-e-commerce-marketplaces

[14] https://commission.europa.eu/strategy-and-policy/priorities-2019-2024/europe-fit-digital-age/digital-services-act_en

[15] https://ec.europa.eu/commission/presscorner/detail/en/ip_24_5622

[16] https://www.euronews.com/next/2024/10/31/eu-commission-probes-chinese-marketplace-temu-under-platform-rules

[17] https://digital-strategy.ec.europa.eu/en/news/commission-requests-information-online-marketplaces-temu-and-shein-compliance-digital-services-act

[18] https://www.reuters.com/business/retail-consumer/temu-shein-ordered-provide-info-eu-tech-rules-compliance-by-july-12-2024-06-28/

This article was first published on 28 January 2025 at:

https://www.iamstobbs.com/opinion/notorious-a-b.i.g.-set-of-markets-for-counterfeiting-and-piracy-to-keep-an-eye-on

Thursday, 23 January 2025

"You spelled it wrong": exploring typo domains

Introduction

A number of recent posts on social media have commented on the case of the guthib[.]com domain, a typo variant of the website of popular code-sharing platform Github, which resolves to a page informing visitors that they have mis-typed the presumably-intended address (Figure 1).

Figure 1: The minimalist webpage at guthib[.]com

Typo variants of popular domain names can commonly attract high volumes of traffic from mis-typed browser requests. In this case, the owner[1] of guthib[.]com fortunately appears not to have had malicious intent, but typo domains are prone to misuse by infringers, presenting a ready opportunity to launch a brand impersonation attack, or misdirect visitors to competitor or unsavoury content. Indeed, in many cases, it may be advisable for brand owners to secure common variants within their official portfolio as part of a domain registration strategy, to prevent acquisition and misuse by third parties.

In this article, we explore the patterns of use in a set of typo variants of the most popular global websites.

Analysis

Our study considers the top twenty most popular global website domain names (as of Nov-2024), according to analytics provider Similarweb[2]. In the analysis we consider all typo variants in which any pair of adjacent characters in the second-level domain (SLD) name (i.e. the part to the left of the dot) is transposed (i.e. swapped), a common form of mis-type[3].

This analysis thereby yields a dataset of 109 typo domains. 42 of these (39%) are registered to the owner of the corresponding official domain in question, as would be the case for a comprehensive defensive domain registration policy. The remainder are registered to third parties (59 domains, or 54%) or are unregistered (8 domains, or 7%) (Table 1).

Table 1: Ownership statistics for the 109 typo variant domain names

The 59 typo variant domains under third-party ownership present significant potential for misuse and thereby pose a concern for the brand owners in question. 32 of these domains (54%) are configured with active mail exchange (MX) records, indicating that they have been configured to be able to send and receive e-mails, and could potentially therefore be under active use for phishing (Table 2).

Table 2: MX record statistics for the 59 typo variant domain names under third-party ownership

The statistics thereby provide insights into the overall level of potential risks posed by typo domains and the approaches taken to defensive registrations for the sites in question - yandex[.]ru, for example, is particularly at risk, with all five of the transposition-based typo domains under third-party ownership and with active MX records.

As a deeper dive, it is also informative to consider the content of any websites associated with the 59 third-party domains in question. 16 (27%) currently pose little threat (resolving to blank pages, placeholder pages, error pages or no active site - though one of the blank pages generates a browser warning of dangerous content which was formerly present), but warrant ongoing monitoring. 30 of the remainder show evidence of efforts to monetise their web traffic through the placement of pay-per-click (PPC) links (28 cases) and/or offers of sale of the domain name (2 cases). The remaining 13 are the highest-threat cases, showing active use in the misdirection of visitors to third-party content[4], which may be of particular concern if it relates to fraudulent use, competitor products or services, or undesirable material (Figure 2).

Figure 2: Examples of typo variant domain names misdirecting visitors to third-party content: (top to bottom) 1. chatgtp[.]com; 2. mcirosoftonline[.]com; 3. ilnkedin[.]com; 4. ayndex[.]ru, ynadex[.]ru, yadnex[.]ru and yandxe[.]ru; 5. bnig[.]com; 6. oprnhub[.]com; 7. itktok[.]com; 8. tkitok[.]com

Discussion

Typo variant domain names can pose a significant threat for brand owners and warrant careful consideration as part of an overall brand protection strategy. At the very least, it would generally be appropriate to implement a brand monitoring capability which is able to detect such variants and (depending on the nature of any associated brand abuse and the extent of IP protection of the brand owner) to enforce against identified infringements.

However, it is also advisable to consider the inclusion of such variants within an official defensive domain portfolio. Of course, it is never possible to secure all possible variants which could feasibly be utilised by a third-party infringer, but it may certainly be worthwhile to factor in common misspellings (such as the simple character transpositions considered in this study). Such considerations are central to the construction of a domain registration and management strategy, which aims to balance cost against risk in the process of building an official portfolio[5,6,7]. Other factors and good practices - such as ensuring that unused domains are configured to re-direct to the official site, to maximise traffic and minimise customer confusion - can also be incorporated into this type of initiative.

References

[1] The domain is registered using a whois privacy service, with an original domain creation date of 04-Mar-2010. The only non-redacted historical records are from a period of time between Mar-2014 and May-2018, during which time it was registered to an Alex Sexton of NZ (apparently alexsexton[at]gmail.com).

[2] https://www.similarweb.com/top-websites/

[3] Note that the analysis therefore excludes consideration of x[.]com, where (as a single-character SLD) no transpositions are possible.

[4] Given the distinctiveness of the strings in question, it is highly likely that at least the vast majority of these cases are attempting to benefit from the renown of the websites in question, rather than legitimately using a similar string independently.

[5] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Section 9.5: 'Domain-name management policy construction'

[6] https://www.iamstobbs.com/opinion/strategies-for-constructing-a-domain-name-registration-and-management-policy

[7] https://www.worldtrademarkreview.com/global-guide/anti-counterfeiting-and-online-brand-enforcement/2022/article/creating-cost-effective-domain-name-watching-programme

This article was first published on 23 January 2025 at:

https://www.iamstobbs.com/opinion/you-spelled-it-wrong-exploring-typo-domains

Wednesday, 15 January 2025

Unregistered Gems Part 6: Phonemizing strings to find brandable domains

Introduction

The UnregisteredGems.com series of articles explores a range of techniques to filter and search through the universe of unregistered domain names, in order to find examples which may be compelling candidates for entities looking to select a new brand name (and its associated domain). The previous instalment of the series[1] looked at the categorisation of candidate names according to the phonetic characteristics of their constituent consonants, using a simple one-to-one mapping between each consonant and a corresponding phonetic group.

In this study, I explore the use of a more formal phonetic representation of each string, involving its conversion to its IPA (International Phonetic Alphabet) representation[2]. This has a number of advantages over the previous approach, including the ability to properly handle differences in pronunciation of particular characters according to their context, handling of character combinations, and the ability to generalise the approach to strings of arbitrary consonant/vowel patterns and length.

Framework

As in the previous study, the strings are classified according to the phonetic categories of their constituent consonants, but with all vowel sounds just combined into a single group. This approach follows from the assertion that the consonants comprise the core 'structure' of the word, and avoids having to handle the more complex nature of vowel sounds (such as the presence of vowel diphthongs, variations in length (i.e. 'long' vs 'short' sounds), and the impact of the accent of the speaker (noting that the IPA conversion tool used is based on American English)).

The consonant sounds are divided into the following groupings, again following the framework used in the previous study, and with the phoneme symbols taking their usual IPA meanings[3,4].

Top-level group
                                
Group
                
Type
                                                
Consonant
phonemes
                        
  1 (plosive) 1A   Bilabial plosive   b, p
  1 (plosive) 1B   Alveolar plosive   d, t, ɾ
  1 (plosive) 1C   Velar plosive   ɡ, k
  2 (nasal) 2A   Bilabial nasal   m
  2 (nasal) 2B   Alveolar nasal   n
  2 (nasal) 2C   Velar nasal   ŋ
  3 (fricative) 3A   Labiodental fricative   f, v
  3 (fricative) 3B   Dental fricative   θ, ð
  3 (fricative) 3C   Alveolar fricative   s, z
  3 (fricative) 3D   Postalveolar fricative   ʃ, ʒ
  3 (fricative) 3E   Glottal fricative   h
  4 (approximant) 4A   Labial-velar approximant   w
  4 (approximant) 4B   Retroflex approximant   ɹ, r[5]
  4 (approximant) 4C   Palatal approximant   j
  5 (lateral approximant) 5A   Alveolar lateral approximant   l
  6 (affricate)[6] 6A   Postalveolar affricate   ʧ, ʤ

Table 1: Groupings assigned to individual consonant phonemes as used in the analysis

Any string can then be represented as a 'code' (the 'word type'), comprising the top-level group numbers of the consonants (and with any vowel sounds, or sequences of consecutive vowel sounds, denoted simply as a 'V'), expressed in the order in which they appear in the string.

For example, therefore, the string 'rolex' is encoded in IPA representation as 'ɹoʊlɛks' which is assigned word type 4V5V13.

Analysis

By analogy with the previous study, it is informative to again consider the same set of 2,000 most popular 5-character (by second-level domain name, or SLD - i.e. the part of the domain name to the left of the dot) names offered for sale on the domain marketplace Atom.com[7] (by virtue of which inclusion they have independently been deemed to be attractive from a brandability point of view), to determine any patterns or common word types within this dataset.

There are actually 627 distinct word patterns represented in this dataset (noting that there are 7 distinct groups into which the phonemes can be assigned (cf. 6 in the previous study), and that there is here no upper limit to the total possible length of the word 'code' representation), of which the top ten are shown in Table 2.

Word type
                                
No. domains
                                
  3V13V 62
  3V3V 62
  1V13V 48
  1V3V 47
  3V1V 35
  4V3V 33
  4V13V 32
  3V35V 31
  3V23V 30
  1V1V 29

Table 2: Top ten word types represented in the dataset of 2,000 most popular 5-character (made-up, up to two syllable) names on the Atom.com domain marketplace

Accordingly, there are 62 of the 2,000 domains whose (SLD) names fit the (joint) most common word-type pattern (3V13V) represented amongst this set of popular domains, which are listed below.

Word type 3V13V:

  • vodzy         
  • vebsy         
  • hauxa         
  • fexie         
  • hixxi
  • xaxor
  • zetza
  • suxxo
  • xaxxy
  • huxxa
  • vudzi
  • sedza
  • hydso
  • vitvy
  • phexy
  • cipza
  • votvy
  • xuxxo
  • zuxxa
  • cexxi
  • zeexo
  • zogzy
  • zepvi
  • ciexa
  • soaxy
  • vapzy
  • vycci
  • fudfy
  • vybsy
  • veexy
  • foxxu
  • vodvi
  • fiexa
  • vuxxy
  • vauxa
  • fabvy
  • zotvo
  • cerxa
  • zatva
  • zepfy
  • vapzi
  • hoxor
  • serxa
  • huxey
  • vegvy
  • vuxoo
  • fotvi
  • vuxxi
  • xoxxy
  • cixxa
  • suxxa
  • vibsi
  • hooxo
  • fauxo
  • zopzy
  • zabvi
  • virxi
  • huxee
  • voixi
  • huxxo
  • zirxo
  • zopvi

Discussion

As discussed in the previous instalment, this type of analysis may allow steps towards the development of a set of 'guidelines' as to which types of word types (i.e. sound patterns) might constitute the most preferred names from a brandability point of view. If so, these ideas could be used as a basis for filtering large datasets to identify possible candidate names of interest. One downside to this approach is that, as with the use of phonotactic analysis[8], the framework presented here involves the conversion of each string to a phonetic representation, which is computationally relatively slow. However, unlike phonotactic analysis, this new methodology provides a basis for a more granular clustering of candidate names, and potentially (providing the preferred word types are correctly selected) may provide a more effective 'mapping' between candidate names and their potential desirability.

If (for example) we assume that word type 3V13V is a 'good' pattern for brandable names, it is informative to investigate its use as a filter. For illustration, we can consider the set of unregistered .com names of the form CVCCV ('C' = consonant, 'V'= vowel) from the original study in this series, using the subset beginning with the letter 's' (a 'group 3'-type sound) as an example. There are 6,044 such names. Of these, 567 (9.4%) are found to be of word type 3V13V[9], and it might be reasonable to assume that (at least some of) these may be candidates for brandability which are at least as credible as the names taken from Atom.com listed above. Some examples of the names in this new filtered dataset include sagsy, sedsi, sicsy, sodsy, sudci, suqsy, sybzi, sycci, sygzy, syksi and sytzo.

References

[1] https://circleid.com/posts/unregistered-gems-part-5-using-groupings-to-find-brandable-domains

[2] The conversion is carried out using the Python module Phonemizer, as was also used in a previous study on the analysis of strings for the purposes of mark similarity quantification: 

[3] https://home.cc.umanitoba.ca/~krussll/phonetics/articulation/describing-consonants.html

[4] https://www.dyslexia-reading-well.com/44-phonemes-in-english.html

[5] Technically, the 'r' phoneme represents a (voiced) alveolar trill, but is grouped together in this analysis with the 'ɹ' sound due to the similarity/ambiguity between the two. 

[6] The Phonemizer module actually outputs these symbols each as two distinct characters ('tʃ' and 'dʒ', respectively), so they are first converted to single characters ('ʧ' and 'ʤ') wherever they appear in the IPA representations, to ensure they are treated as single phonemes ('ch' and 'dg', respectively) in the subsequent analysis. 

[7] https://www.atom.com/premium-domains-for-sale/all/length/5%20Letters

[8] https://circleid.com/posts/20240903-unregistered-gems-identifying-brandable-domain-names-using-phonotactic-analysis

[9] Actually, this is the second most common word type in the dataset, after 3V11V (651 instances), though there are actually 94 distinct word types represented in the sVCCV dataset. 

This article was first published on 14 January 2025 at:

https://circleid.com/posts/unregistered-gems-part-6-phonemizing-strings-to-find-brandable-domains

Tuesday, 14 January 2025

It’s a dark whois world

Introduction

A recent study by Interisle[1,2] has highlighted the prevalence of a lack of identifying contact information in the registration records of gTLD (generic top-level domain) domain names, with the claim that almost 90% of records are devoid of such information[3]. This trend is a familiar one following the introduction of the General Data Protection Regulation (GDPR) in 2018, in response to which much of the available contact information was redacted, but is arguably just a continuation of a pattern which was anyway becoming more common; use of privacy and proxy services is attractive to many registrants desiring online anonymity, and can be of particular appeal to infringers.

The study by Interisle considers a set of 3,000 domain names and also includes a focus on attempting to identify contact details on any associated hosted websites. In this article, we consider an analysis of a broader dataset of gTLD names, but focusing just on the information in the whois records themselves (which are explicitly covered by an ICANN regulation requiring the provision of accurate contact information to the registrar[4] - even if the registrar then 'masks' this information publicly), with a view to assessing the extent and implications of 'dark' whois records within the domain landscape.

Methodology and overview

The analysis considers a sample set of 500 domain names[5] from each of the 100 largest gTLD zone files, giving a total dataset of 50,000 domains, and considers only those whois records which are available via an automated look up (focusing specifically on the registrant name / organisation and e-mail address fields as given in the record).

In the study, we look to determine the prevalence of each of a series of whois record 'categories' corresponding to the degree of privacy protection or redaction used, and mirroring the definitions used by ICANN[6]:

  • Use of a proxy service - this is where no explicit information to the 'real' registrant is given in the name or e-mail address field of the record. Proxy service providers use their own contact details in the whois record and, technically, are in each case the legal registered owner of the domain, acting as a licensor of the name to the end customer.
  • Use of a privacy service - in this case, the customer is the registered owner of the domain, and is featured in the registrant name field of the whois record, although other contact details may be absent (often in place of forwarding e-mail addresses supplied by the service provider).
  • Redaction - this definition is taken to be where the term "redacted" explicitly appears in the whois record in place of one of the other fields normally present. In this study, redacted records are subdivided into those where a specific identifiable registrant is named, and those where this is not the case. Note that this category includes cases where an explicit contact e-mail address may also be given (which, according to some definitions, might be considered to be 'open' records).
  • 'Open' - these are cases where an explicit owner name and contact e-mail address is given. It is worth noting that this is a relatively strict definition, and excludes cases where the e-mail address is that of the underlying registrar or other service provider (taken in this analysis to be privacy-protected records).

Why is this issue important? Fundamentally, the absence of personal identifying information in a domain whois record makes it more difficult for brand owners to launch enforcement actions against infringers - particularly where 'real-world' escalation routes may be required - and can therefore be amenable to a scenario which is advantageous for bad actors. Although in some cases it may be possible to serve a notice requesting that a registrar reveals the underlying contact information they hold (and where provably inaccurate information can be grounds for domain suspension), levels of compliance and documentary requirements by registrars can be highly variable.

Furthermore, a dark whois landscape makes it more difficult for brand protection initiatives to be able to prioritise and cluster together domain results based on shared characteristics, making the execution of efficient bulk takedowns a more complex prospect, and increasing the difficulty in demonstrating bad faith activity by serial infringers.

Findings

Of the 50,000 domains in the dataset, only 14,908 (29.8%) have whois records which are available via automated look-up (noting that 51 of the 100 gTLDs do not return any information in response to automated queries), though noting that this is the dataset on which the remainder of the analysis is based. 36 of the 100 gTLDs do return whois records for at least half of the domains queried.

Overall, only 110 of the domains in the dataset (0.74%) were classified as having 'open' whois records - an extremely small proportion, but perhaps unsurprising in view of the strict definition used, and potentially best viewed as a conservative estimate. These domains are spread across fifteen different TLDs: .africa (3 domains), .agency (1), .art (1) .best (4), .bond (3), .cam (1), .christmas (7), .com (14), .company (1), .fun (5), .icu (14), .net (5), .pics (33), .tech (9) and .website (9).

The full statistics are shown in Table 1.

Category
                                                                
No. domains
                                
%
                                
  Proxy 9,384 62.95 %
  Privacy 524 3.51 %
  Redaction 3,377 22.65 %
  Redaction (with named registrant) 1,513 10.15 %
  'Open' 110 0.74 %
  TOTAL 14,908 100.00 %

Table 1: Numbers of domains with each category of whois record

The prevalence of use of proxy services is striking - accounting for almost two-thirds of domains in the dataset - but also shows significant variability between TLDs. In total, the samples of domains from eight of the TLDs in the dataset showed an adoption rate of proxy services of greater than 80%: .today (94.72%; N = 417), .shop (94.71%; N = 170), .christmas (93.13%; N = 495), .one (86.84%; N = 38); .cam (85.13%; N = 417), .zone (84.96%; N = 419), .media (84.90%; N = 384), .art (81.25%; N = 208) (where N is the number of domains (out of 500) in each case for which a whois record was returned by the automated look-up) (see also Appendix A).

It is also informative to consider the most commonly-used proxy service providers, and contact e-mail addresses given in privacy-protected records (Tables 2 and 3).

Registrant name
                                                                                                
No. domains
                                
%
                                
  Domains By Proxy, LLC 2,788 29.71 %
  Privacy service provided by Withheld for Privacy ehf 2,374 25.30 %
  None 1,066 11.36 %
  Super Privacy Service LTD c/o Dynadot 968 10.32 %
  Private by Design, LLC 360 3.84 %
  Whois Privacy Protection Service, Inc. 285 3.04 %
  Privacy Protect, LLC (PrivacyProtect.org) 241 2.57 %
  Contact Privacy Inc. Customer [] 214 2.28 %
  PrivacyGuardian.org llc 194 2.07 %
  See PrivacyGuardian.org 180 1.92 %

Table 2: Most common 'registrant organisation' fields given in domains using proxy services

E-mail address
                                                                
No. domains
                                
%
                                
  domainabuse@service.aliyun.com 188 30.37 %
  abuse@name.com 59 9.53 %
  abuse@reg.ru 41 6.62 %
  abuse@dns.business 32 5.17 %
  abuse@domains.co.za 31 5.01 %
  domainabuse@netim.net 20 3.23 %
  whois@domain-mgmt.net 10 1.62 %
  abuse@key-systems.net 10 1.62 %
  abuse@59.cn 10 1.62 %
  abuse@wdomain.com 10 1.62 %

Table 3: Most common contact e-mail addresses[7] given in privacy-protected records

Discussion

The paucity of 'real-world' contact details given in domain whois records is, in part, a construct of an environment where the appeal of anonymity is great, and is generating an online ecosystem which is advantageous for infringers and can be increasingly problematic for brand owners. This does not, of course, mean that nothing can be done from an enforcement point of view - requests for unmasking of contact details held by registrars can be successful in many cases where proof of wrongdoing is available. Even in the absence of registrant contact details, there is a range of enforcement approaches - such as hosting provider and registrar level notices - which are available. At the other end of the spectrum, for the highest priority infringements, a full formal domain dispute procedure can also serve as a means for obtaining registrant contact details.

In many cases, it may also be possible to build a picture of an infringer's activity by using a range of online and offline open-source intelligence (OSINT) investigation approaches, often using data-points taken from the website content itself, or information taken from historical whois databases, as a start point.

The introduction of schemes such as the Registration Data Request Service (RDRS) by ICANN, offering a simplified and standardised process for requesting registrant information[8], may also be a step in the right direction. It is also worth noting that the whois protocol itself, lacking many up-to-date technical attributes, is scheduled to be phased out in 2025 in favour of the more standardised Registration Data Access Protocol (RDAP), which has an improved underlying technology.

Going forward, it may transpire that the balance between demands for privacy and online protection forces a push back towards the previous environment of requiring a greater degree of accountability for website owners, and forcing a move towards more comprehensive whois databases. Adoption of mandates such as the Network and Information Security (NIS2) Directive, requiring registries and registrars to collect and provide free access to detailed ('thick' whois) information[9], may be part of this picture.

Appendix A: Numbers of domains with each category of whois record, by TLD

(N = number of domains for which a whois record was returned by the automated look-up)

TLD
                                
Proxy
                          
Privacy
                          
Redaction
                          
Redaction
(with named
registrant)
                          
Open
                          
N
                          
  pics 77.20 % 2.80 % 12.00 % 1.40 % 6.60 % 500
  christmas 93.13 % 1.62 % 3.43 % 0.40 % 1.41 % 495
  xyz 63.41 % 6.91 % 29.67 % 0.00 % 0.00 % 492
  africa 53.66 % 28.46 % 14.02 % 3.25 % 0.61 % 492
  com 70.19 % 5.59 % 19.67 % 1.66 % 2.90 % 483
  icu 27.67 % 19.92 % 49.48 % 0.00 % 2.94 % 477
  asia 42.00 % 0.00 % 39.78 % 18.22 % 0.00 % 450
  fun 55.01 % 12.25 % 30.29 % 1.34 % 1.11 % 449
  bond 42.76 % 2.45 % 43.88 % 10.24 % 0.67 % 449
  zone 84.96 % 0.00 % 6.68 % 8.35 % 0.00 % 419
  today 94.72 % 0.00 % 3.12 % 2.16 % 0.00 % 417
  cam 85.13 % 1.68 % 10.79 % 2.16 % 0.24 % 417
  best 48.19 % 1.69 % 46.51 % 2.65 % 0.96 % 415
  photography 58.72 % 0.00 % 30.47 % 10.81 % 0.00 % 407
  services 72.87 % 0.00 % 12.14 % 14.99 % 0.00 % 387
  solutions 77.72 % 0.00 % 11.40 % 10.88 % 0.00 % 386
  website 69.69 % 6.48 % 19.95 % 1.55 % 2.33 % 386
  media 84.90 % 0.26 % 9.11 % 5.73 % 0.00 % 384
  rocks 51.77 % 0.00 % 33.79 % 14.44 % 0.00 % 367
  academy 60.38 % 0.00 % 21.86 % 17.76 % 0.00 % 366
  global 60.11 % 0.27 % 19.67 % 19.95 % 0.00 % 366
  net 61.62 % 4.76 % 30.53 % 1.68 % 1.40 % 357
  link 71.55 % 0.00 % 20.85 % 7.61 % 0.00 % 355
  systems 51.14 % 0.28 % 23.58 % 25.00 % 0.00 % 352
  social 61.78 % 0.00 % 25.00 % 13.22 % 0.00 % 348
  care 54.33 % 0.30 % 25.37 % 20.00 % 0.00 % 335
  rest 79.39 % 0.00 % 19.39 % 1.21 % 0.00 % 330
  consulting 43.96 % 0.31 % 28.79 % 26.93 % 0.00 % 323
  llc 67.30 % 0.00 % 12.26 % 20.44 % 0.00 % 318
  digital 64.08 % 0.32 % 23.62 % 11.97 % 0.00 % 309
  wtf 70.82 % 0.00 % 18.36 % 10.82 % 0.00 % 305
  company 45.92 % 1.02 % 23.81 % 28.91 % 0.34 % 294
  games 55.48 % 0.34 % 29.45 % 14.73 % 0.00 % 292
  info 59.44 % 1.05 % 21.33 % 18.18 % 0.00 % 286
  agency 66.90 % 1.76 % 19.01 % 11.97 % 0.35 % 284
  email 38.85 % 0.00 % 30.58 % 30.58 % 0.00 % 278
  tech 52.99 % 21.37 % 19.23 % 2.56 % 3.85 % 234
  art 81.25 % 7.69 % 10.10 % 0.48 % 0.48 % 208
  shop 94.71 % 0.00 % 5.29 % 0.00 % 0.00 % 170
  org 37.95 % 0.00 % 30.12 % 31.93 % 0.00 % 166
  cloud 21.85 % 0.00 % 40.34 % 37.82 % 0.00 % 119
  wiki 69.01 % 0.00 % 8.45 % 22.54 % 0.00 % 71
  ink 22.58 % 0.00 % 27.42 % 50.00 % 0.00 % 62
  amsterdam 33.33 % 0.00 % 54.17 % 12.50 % 0.00 % 48
  one 86.84 % 0.00 % 13.16 % 0.00 % 0.00 % 38
  top 29.41 % 0.00 % 70.59 % 0.00 % 0.00 % 17
  app 50.00 % 0.00 % 0.00 % 50.00 % 0.00 % 2
  tel 0.00 % 0.00 % 50.00 % 50.00 % 0.00 % 2
  page 0.00 % 0.00 % 100.00 % 0.00 % 0.00 % 1
  autos - - - - - 0
  bayern - - - - - 0
  bet - - - - - 0
  bio - - - - - 0
  biz - - - - - 0
  blog - - - - - 0
  business - - - - - 0
  buzz - - - - - 0
  cfd - - - - - 0
  click - - - - - 0
  club - - - - - 0
  cyou - - - - - 0
  design - - - - - 0
  dev - - - - - 0
  eus - - - - - 0
  family - - - - - 0
  fyi - - - - - 0
  group - - - - - 0
  homes - - - - - 0
  ing - - - - - 0
  lat - - - - - 0
  life - - - - - 0
  live - - - - - 0
  lol - - - - - 0
  love - - - - - 0
  ltd - - - - - 0
  mobi - - - - - 0
  mom - - - - - 0
  name - - - - - 0
  network - - - - - 0
  news - - - - - 0
  nrw - - - - - 0
  online - - - - - 0
  ovh - - - - - 0
  pro - - - - - 0
  realtor - - - - - 0
  sbs - - - - - 0
  site - - - - - 0
  skin - - - - - 0
  space - - - - - 0
  store - - - - - 0
  studio - - - - - 0
  swiss - - - - - 0
  team - - - - - 0
  tokyo - - - - - 0
  vip - - - - - 0
  wang - - - - - 0
  win - - - - - 0
  work - - - - - 0
  world - - - - - 0
  zip - - - - - 0

References

[1] https://dnib.com/articles/interisle-report-examines-domain-name-contact-data-availability

[2] https://circleid.com/posts/new-data-on-domain-name-contact-availability-and-privacy

[3] Strictly, the study relates to the Registration Data Directory Services (RDDS) system(s) offered by registries and registrars for providing access to registration data, of which the familiar whois service is a subset - see https://www.icann.org/resources/pages/whois-rdds-2023-11-02-en

[4] https://www.icann.org/resources/pages/wdrp-2012-02-25-en

[5] The sample comprises every 25th domain in the order in which they appear in the zone file (generally alphabetical), until 500 have been extracted - this value was selected as all 100 of the zone files analysed contain at least 12,500 domain names

[6] https://www.icann.org/resources/pages/pp-services-2017-08-31-en

[7] Note that his may actually be the abuse contact e-mail address for the registrar; this may be the only explicit e-mail address given in the whois record in many cases.

[8] https://www.linkedin.com/posts/stobbs_rdrs-activity-7212106221485531136-Rr7B

[9] https://www.uschamber.com/technology/domain-name-data-why-its-disappearing-and-why-you-should-care

This article was first published on 14 January 2025 at:

https://www.iamstobbs.com/opinion/its-a-dark-whois-world

Br'AI've New World - Part 1: Brand protection 'clustering' as a candidate task for the application of AI capabilities

Introduction The issue of 'clustering' in brand protection - that is, the ability to flexibly identify the existence of links betwee...