Friday, 25 July 2025

The commonest domain features: constructing look-up tables for use as part of a domain risk scoring system

Many previous pieces of research have focused on the desirability of a comprehensive scoring system, to be used for ranking results identified as part of a brand-protection solution, according to their potential level of threat. Such scoring systems offer the capability for identifying prioritised targets for further analysis, content tracking or enforcement actions[1, 2].

In a recent Stobbs study[3], we considered the case of a basic scoring system for domain-name results - a key category of findings because of the possibility for relatively comprehensive monitoring, the high online visibility of related infringements, the explicit nature of any associated IP abuse and the greater range of options for enforcement[4]. The algorithm presented in the initial study focused on characteristics of the domain name itself, taking account of factors such as the location and context within the domain name of the brand name of interest, the presence of relevance or non-relevance keywords, and the proportion of the domain name composed of other characters. This technique allows for an initial filtering of the list of candidate domain names of interest, and can be augmented by a second stage of filtering to take account of the content of any associated webpage, considering factors such as the number and prominence of mentions of the brand name, and the presence in the site content of relevance keywords.

What this initial algorithm does not encompass is any consideration of other technical or configuration factors associated with the domain, comprising any of a number of features which can also provide some indication of its likely potential level of risk. Examples of such characteristics considered in previous studies include the TLD (top-level domain, or domain extension) (working on the basis that some TLDs are more popular with infringers than others, due to factors such as cost, ease of registration, the presence of IP protection programmes, and the ease of enforcement)[5], and the host IP address (based on the assertion that websites hosted at (or near) IP addresses containing a large number of other 'bad' or blacklisted websites are themselves more likely to pose a risk)[6].

Overall, the set of relevant potential characteristics for assessing possible risk include the TLD, and the identity of associated domain service providers such as the registrar, hosting provider and nameserver host. These types of providers are typically associated with differing levels of 'trust', connected to factors such as compliance to enforcement requests and popularity with infringers[7]. As such, the use of a provider showing a greater degree of association with previously known 'bad' sites arguably provides an indication that any other arbitrary site associated with the same provider is - other factors being equal - more likely to be associated with greater degree of risk.

The basic methodology for constructing a threat-score algorithm on this basis thereby involves collating a large database of known bad sites (identified by - for example - comparison of website templates with those used by previously identified infringing sites, or by analysis and verification (as infringing) of results identified through a brand monitoring service), and extracting the features of interest for these known 'bad' sites. This process makes it possible to create 'league tables' of the top features and providers which tend more frequently to be associated with infringing sites.

One key point to note, however, is that merely the association of large numbers of infringing sites with a particular domain characteristic does not necessarily mean that that characteristic conveys higher risk. One particular reason why this may be the case is that certain characteristics are simply more common generally, and would therefore be associated with larger numbers of 'bad' sites even if the rate of association (i.e. the number as a proportion of the total) with such sites was not disproportionate. As an illustration of this point, we can consider the TLD; the .com domain extension, for example, will generally always be associated with large numbers of infringements, due simply to the large total number of domains registered on this extension. Accordingly, there will normally be a requirement to 'normalise' the raw numbers, by dividing the number of observed infringements by the total numbers of registered domains associated with the same instance of the particular feature (i.e. in the case of TLD, the total number of registered .com domains), to generate a measure of infringement frequency or 'hit rate' associated with the instance in question. Domain characteristics with greater infringement frequencies are generally more likely to be associated with higher risk.

In order to be able to carry out this type of analysis, it is necessary to compile 'look-up tables' of the (proportion of the) total numbers of registered domains which are associated with each possible option, for each feature of interest - i.e. ranked lists (by total (or relative) numbers) of the possible domain TLDs, registrars, hosting providers and nameserver hosts. The remainder of this article considers the process of compiling these lists and is illustrated by tables of the top entries (i.e. the most commonly-appearing options within the datasets) in each case. Whilst this has clear applications in threat scoring, it can also provide general insights in its own right, in terms of showing general trends within the domain name landscape.

Individual domain features

1. TLD

The total numbers of domains by TLD is a relatively simple statistic to obtain, as it can be trivially extracted from analysis of domain name zone files (at least for gTLDs (i.e. generic TLDs), for which the corresponding registries publish the data files and make them publicly accessible). A more comprehensive dataset (with significant additional ccTLD (i.e. country-code TLDs) coverage) is that provided by DomainTools[8], from which the top ten TLDs are shown in Table 1.

TLD
                                
No. domains
                                
% of dataset
                                
  .com 155,728,200 43.86%
  .de 17,378,724 4.89%
  .net 12,346,352 3.48%
  .cn 11,975,245 3.37%
  .org 11,226,231 3.16%
  .uk 9,752,126 2.75%
  .nl 5,973,733 1.68%
  .ru 5,795,959 1.63%
  .top 5,326,770 1.50%
  .br 4,989,115 1.41%

Table 1: The top ten TLDs by number of registered domains (N = 355,069,958) (DomainTools, 08-Jul-2025)

2. Registrars

For domain registrars, the ideal statistic would be the total numbers of domains under management by each registrar. One estimate of this total statistic is that provided by DomainNameStat[9], although some degree of 'post-processing' is required in order to obtain a 'clean' dataset, due to the existence of a range of variations by which some of the individual distinct registrars are referred to (e.g. with or without '.com', 'Inc.', 'Ltd', 'LLC', and the existence of other variations - e.g. there are over 1,200 distinct entries for DropCatch.com in DomainNameStat's list, mostly of the form 'DropCatch.com XXX LLC', where 'XXX' is a three- or four-digit string). The 'cleansed' list consists of over 1,100 distinct entities, of which the top ten are shown in Table 2.

Registrar
                                                                
No. domains
                                
% of dataset
                                
  GoDaddy.com 87,123,338 26.93%
  NameCheap 24,389,502 7.54%
  Tucows Domains 13,256,889 4.10%
  Squarespace Domains 12,352,131 3.82%
  Dynadot 9,019,705 2.79%
  NameSilo 7,393,306 2.29%
  GMO Internet Group, Inc. d/b/a Onamae.com 7,268,309 2.25%
  IONOS 6,749,280 2.09%
  Gname.com 6,659,851 2.06%
  HOSTINGER operations 6,055,416 1.87%

Table 2: The top ten registrars by total number of domains under management (N = 323,498,496) (DomainNameStat, 08-Jul-2025)

As a 'sanity-check', it is informative to compare these statistics with those identified through an explicit look-up process. In order to reduce the number of look-ups required, though still maintaining a representative sample of the overall domain universe, we consider a set of domains taken by extracting each 500th domain from each of the domain name data zone files. Broadly, domains are contained within the individual zone files in alphabetical order, so this equally-spaced sample should essentially provide a 'random' representative set of domains, which should not correlate obviously with any other characteristic. The only significant bias is that the zone-file analysis will exclude ccTLD domain results.

The sampling process described above generates a dataset of just under half a million domains (actually around 484,000), from the total set of around 350 million registered domains. Carrying out a whois look-up on each domain in the sample dataset (where information is available on an automated basis) makes it possible to extract the registrar identity in around 390,000 cases. Following a similar data 'cleansing' process to that described previously, the top ten registrars from this dataset are shown in Table 3.

Registrar
                                                                
No. domains
                                
% of dataset
                                
  GoDaddy.com 116,640 29.80%
  NameCheap 28,389 7.25%
  Squarespace Domains 17,511 4.47%
  Tucows Domains 17,007 4.34%
  Network Solutions 8,939 2.28%
  IONOS 8,802 2.25%
  Gname.com 8,431 2.15%
  Dynadot 8,155 2.08%
  GMO Internet 7,633 1.95%
  HOSTINGER operations 6,474 1.65%

Table 3: The top ten registrars by number of domains under management, based on a zone-file 'sampling' exercise (N = 391,416)

Overall, there is a good degree of similarity between these two lists (i.e. that provided by DomainNameStat and that provided by the sampled zone-file dataset), and the datasets do correlate with each other very well (correlation coefficient = 0.9923) (Figure 1).

Figure 1: Comparison of the numbers of domains under management for each registrar as given by DomainNameStat and the zone-file sampling exercise

In this case, the statistics from DomainNameStat probably constitute a better dataset for use in threat scoring analysis (not least because of the vastly increased number of data points), but the high degree of correlation with the zone-file sample does provide some confidence that the latter dataset constitutes a robust data-source for analysis in extracting alternative domain features, such as those discussed below, in cases where no definitive third-party data overviews are available.

3. Nameserver hosts

The nameserver host (defined as the domain name given as the end-section of the nameserver (NS) record for the domain in question - e.g. 'cloudflare.com' in the case of 'aaden.ns.cloudflare.com') can easily be extracted for any given domain via a simple whois look-up. The statistics given for this feature relate to the first (primary) nameserver record for each domain, based on the dataset obtained from the zone-file sampling exercise (Table 4).

Nameserver host
                                                                
No. domains
                                
% of dataset
                                
  domaincontrol.com 88,073 22.61%
  cloudflare.com 35,422 9.09%
  googledomains.com 16,290 4.18%
  registrar-servers.com 14,286 3.67%
  wixdns.net 11,344 2.91%
  afternic.com 10,278 2.64%
  dns-parking.com 9,082 2.33%
  hichina.com 7,250 1.86%
  share-dns.com 6,324 1.62%
  namebrightdns.com 6,251 1.60%

Table 4: The top ten nameserver hosts by number of domains, based on the zone-file sample dataset (N = 389,584)

4. Hosting providers

The hosting provider for a domain is defined as the operator of the webserver associated with the (primary) IP address at which the domain is hosted. In this case, the 'top' hosting providers could be calculated on a per-IP address or a per-domain basis; however, in this analysis, the latter approach is taken (since, in general, different IP addresses will be associated with differing numbers of hosted domains, so a per-domain approach provides a more representative overview), again using the sampled zone file dataset (Table 5).

Hosting provider
                                                                
No. domains
                                
% of dataset
                                
  Amazon 115,151 37.22%
  Cloudflare[10] 33,945 10.97%
  Squarespace 13,534 4.37%
  Namecheap 11,604 3.75%
  Google 8,894 2.87%
  Shopify 8,116 2.62%
  GoDaddy.com 4,918 1.59%
  Unified Layer 4,604 1.49%
  PSINet 4,398 1.42%
  Newfold Digital 4,298 1.39%

Table 5: The top ten hosting providers by number of domains under management, based on the zone-file sample dataset (N = 309,409)

Conclusion

Whilst the statistics presented in this article provide some insights regarding the sets of top domain service providers in their own right, the most obvious application is (using the full datasets in each case, rather than just the top-tens shown in this overview) as 'look-up' tables, for the purposes of normalisation of statistics of those features most commonly associated with infringing or otherwise 'bad' sites, as part of an overall threat-scoring approach. A fuller formulation of such an approach - which is key to identifying priority targets from (potentially very large) sets of brand-monitoring results - will also require a dataset of known 'bad' sites, which should itself be as large as possible so as to provide the most meaningful statistics. Ultimately, it is likely that other domain characteristics (such as registrant characteristics, SSL providers, etc.), in addition to other features such as the presence of MX records, web traffic, etc., will also feed into the construction of an overall comprehensive algorithm.

References

[1] https://circleid.com/posts/towards-a-generalised-threat-scoring-framework-for-prioritising-results-from-brand-monitoring-programmes

[2] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 5: 'Prioritisation criteria for specific types of content'

[3] https://www.iamstobbs.com/insights/exploring-a-domain-scoring-system-with-tricky-brands

[4] https://www.worldtrademarkreview.com/global-guide/anti-counterfeiting-and-online-brand-enforcement/2022/article/creating-cost-effective-domain-name-watching-programme

[5] https://circleid.com/posts/20230117-the-highest-threat-tlds-part-2

[6] https://www.iamstobbs.com/insights/notorious-ip-addresses-and-initial-steps-towards-the-formulation-of-an-overall-threat-score-for-websites

[7] https://circleid.com/posts/notorious-hosting-providers-an-overview-of-the-highest-threat-hosts-from-ip-address-blacklist-analysis

[8] https://research.domaintools.com/statistics/tld-counts/

[9] https://domainnamestat.com/statistics/registrar/others

[10] It is worth noting that Cloudflare offers 'pass-through' services, such that many websites simply utilising Cloudflare services will be associated with Cloudflare as the listed hosting provider. In such cases, the 'true' hosting provider can generally be determined only by contacting Cloudflare directly. 

This article was first published on 25 July 2025 at:

https://circleid.com/posts/the-commonest-domain-features-constructing-look-up-tables-for-use-as-part-of-a-domain-risk-scoring-system

Thursday, 24 July 2025

you/talk/fast/med: another forthcoming batch of new gTLDs

Following our previous discussion[1] of a new batch of domain-name extensions to be launched as part of the ongoing first phase of the new-gTLD programme, we present some follow-up comments relating to the next set of TLDs to be released; namely, .fast, .talk, .you, and .med. 

.fast, .talk and .you, all offered through Amazon Registry, are set to enter their sunrise periods on 26 August, before going into general availability most probably in October. The .med extension is to relaunch on an unrestricted basis with a pre-registration phase from 31 August, and open registrations from 2 September[2].

As with any new TLDs, the launches offer an opportunity for brand owners to review their domain registration policies, with a view potentially to registering relevant names for brand-related use, making defensive registrations or, at the very least, to proactively monitor the landscape and/or consider blocking mechanisms in order to defend against third-party abuse. 

Given the nature of the four new extensions, there is potential for a wide range of use-cases. Some of the TLDs in particular could be relevant to specific business areas (particularly for .talk, which may be appropriate for communications service providers or for content relating to marketing or reviews; and .med, which has potential in medical or pharmaceutical applications - an area of specific risk for counterfeit products - or businesses with connections to the Mediterranean). This set of TLDs generally also offers options for utilisation as part of a tagline, or to convey specific brand messaging. 

As in our previous study on new TLD launches, we consider the current landscape of similar domain names, as a proxy for the sorts of activity which may manifest themselves following the launches of the new extensions. Specifically, we consider the set of domain names ending with the respective terms.

As of the start of July 2025, zone-file analysis reveals over 252k domains with names ending with 'you', 153k with 'med', 61k with 'talk' and 58k with 'fast' (Figure 1). Table 1 shows the top five (pre-existing) TLDs represented in each of these four datasets.

Figure 1: Total numbers of (pre-existing) domains ending with 'you', 'med', 'talk' and 'fast' (split by (legacy) TLD)

Table 1: Numbers of domains associated with each of the top five (legacy) TLDs in each of the current sets of domains ending with 'you', 'med', 'talk' and 'fast'

Some trends are immediately apparent, such as the apparent popularity of those terms most directly relevant to the English language ('you', 'talk' and 'fast') with the UK market in particular (as evidenced by the extensive use of .co.uk), and the potential relevance of 'fast' to e-commerce (given the frequency of use of the .shop extension). 

In some cases, the data is complicated by the presence of 'false positives' (which obscure any insights relating to the likely future use of the more explicitly relevant new-gTLD extensions), particularly for 'med' (which frequently occurs as a sub-string of other terms such as 'armed', 'formed', 'groomed', 'teamed' and a wide range of others, plus names such as 'muhammed') and 'fast' ('breakfast'). 

Having removed any obvious false positives, some insights can be gained by looking at the types of terms which most frequently appear immediately prior to the terms in question, within the set of domain names. In this analysis, we consider the strings of different lengths preceding the keywords under consideration, to assess any obvious patterns of usage.

The following lists show the most common English language words[3] present in these datasets, for each of the terms considered:

Domains ending with 'you':

  • 9-character words preceding 'you':
    • 'beautiful' (256 instances ) (i.e. domains end with 'beautifulyou')
    • 'healthier' (151)
  • 8-character words preceding 'you':
    • 'welcomes' (180)
  • 7-character words preceding 'you':
    • 'healthy' (317)
    • 'without' (144)
  • 6-character words preceding 'you':
    • 'better' (532)
    • 'within' (447)
    • 'around' (331)
  • 5-character words preceding 'you':
    • 'loves' (1,427)
    • 'thank' (1,294)
    • 'about' (1,015)
    • 'moves' (443)
    • 'round' (348)
    • 'comes' (276)
    • 'found' (246)
    • 'bless' (157)
    • 'helps' (141)
    • 'power' (122)
    • 'teach' (116)
    • 'happy' (116)
  • 4-character words preceding 'you':
    • 'with' (3,615)
    • 'near' (2,521)
    • 'love' (2,111)
    • 'like' (984)
    • 'help' (765)
    • 'best' (377)
    • 'meet' (358)
    • 'find' (276)
    • 'miss' (237)
    • 'move' (220)
    • 'plus' (190)
    • 'real' (187)
    • 'from' (185)
    • 'need' (177)
    • 'told' (170)
    • 'know' (168)
    • 'true' (164)
    • 'dare' (163)
    • 'hear' (159)
    • 'want' (153)
  • 3-character words preceding 'you':
    • 'for' (40,991)
    • 'and' (4,496)
    • 'are' (1,257)

Domains ending with 'med':

  • 11-character words preceding 'med':
    • 'integrative' (123)
  • 10-character words preceding 'med':
    • 'functional' (171)
  • 9-character words preceding 'med':
    • 'concierge' (73)
    • 'lifestyle' (72)
    • 'precision' (60)
  • 8-character words preceding 'med':
    • 'internal' (238)
    • 'wellness' (42)
  • 7-character words preceding 'med':
    • 'natural' (101)
  • 6-character words preceding 'med':
    • 'sports' (1,220)
    • 'family' (675)
    • 'health' (273)
    • 'global' (129)
    • 'beauty' (110)
    • 'dental' (86)
    • 'mobile' (74)
    • 'travel' (63)
    • 'pharma' (62)
    • 'physio' (58)
    • 'techno' (57)
    • 'cardio' (47)
    • 'future' (41)
    • 'social' (39)
    • 'gastro' (39)
  • 5-character words preceding 'med':
    • 'sport' (177)
    • 'sleep' (97)
    • 'laser' (84)
    • 'smart' (80)
    • 'ortho' (79)

Domains ending with 'talk':

  • 10-character words preceding 'talk':
    • 'realestate' (76)
    • 'webhosting' (34)
  • 9-character words preceding 'talk':
    • 'marketing' (40)
  • 8-character words preceding 'talk':
    • 'straight' (272)
    • 'business' (118)
    • 'football' (64)
  • 7-character words preceding 'talk':
    • 'bitcoin' (40)
    • 'teacher' (39)
    • 'toolbox' (37)
    • 'fashion' (33)
    • 'english' (28)
  • 6-character words preceding 'talk':
    • 'sports' (428)
    • 'coffee' (171)
    • 'health' (166)
    • 'pillow' (143)
    • 'travel' (98)
    • 'street' (67)
    • 'beauty' (63)
    • 'people' (61)
    • 'social' (50)
    • 'family' (48)
  • 5-character words preceding 'talk':
    • 'small' (330)
    • 'table' (319)
    • 'could' (288)
    • 'money' (233)
    • 'trash' (114)
    • 'cross' (98)
    • 'power' (90)
  • 4-character words preceding 'talk':
    • 'tech' (842)
    • 'real' (517)
    • 'lets' (453)
    • 'talk' (270)
    • 'shop' (255)
    • 'body' (242)
    • 'news' (205)
    • 'girl' (180)
    • 'self' (150)

Domains ending with 'fast':

  • 9-character words preceding 'fast':
    • 'insurance' (59)
    • 'solutions' (24)
    • 'marketing' (23)
    • 'followers' (16)
    • 'customers' (15)
  • 8-character words preceding 'fast':
    • 'property' (98)
    • 'business' (67)
    • 'mortgage' (28)
    • 'websites' (20)
    • 'anything' (19)
    • 'approved' (18)
    • 'delivery' (17)
    • 'patients' (15)
  • 7-character words preceding 'fast':
    • 'funding' (54)
    • 'capital' (47)
    • 'clients' (36)
    • 'nowhere' (35)
    • 'blazing' (34)
    • 'service' (33)
    • 'english' (27)
    • 'finance' (27)
    • 'website' (26)
    • 'results' (26)
    • 'digital' (26)
    • 'connect' (21)
    • 'closure' (21)
    • 'forward' (20)
    • 'tickets' (19)
    • 'healthy' (19)
    • 'freedom' (19)
  • 6-character words preceding 'fast':
    • 'houses' (382)
    • 'weight' (136)
    • 'online' (94)
    • 'health' (31)
    • 'crypto' (29)
    • 'repair' (28)
    • 'quotes' (28)
    • 'better' (28)
    • 'travel' (23)
    • 'pounds' (22)
    • 'ticket' (21)
    • 'shirts' (21)
    • 'strong' (21)
    • 'design' (21)
  • 5-character words preceding 'fast':
    • 'house' (715)
    • 'super' (291)
    • 'homes' (238)
    • 'loans' (131)
    • 'parts' (92)
    • 'trade' (74)
    • 'learn' (63)
    • 'stand' (62)
    • 'ultra' (61)
    • 'think' (60)
    • 'smart' (58)
    • 'funds' (51)
  • 4-character words preceding 'fast':
    • 'home' (477)
    • 'cash' (273)
    • 'sold' (166)
    • 'food' (126)
    • 'grow' (98)
    • 'tech' (78)
    • 'shop' (76)
    • 'care' (73)
    • 'ship' (68)
    • 'very' (67)
    • 'read' (67)
    • 'help' (67)
    • 'easy' (67)
  • 3-character words preceding 'fast':
    • 'use' (727)
    • 'and' (426)
    • 'old' (321)
    • 'are' (146)
    • 'buy' (143)
    • 'car' (131)
    • 'pay' (118)

This type of analysis may be able to help inform strategic considerations by brand owners on possible-use cases for the new gTLDs when they launch, and specifically whether there are any good fits for product types, slogans and potential wider marketing initiatives. Similarly, however, the same characteristics of these new domain extensions can make them attractive to infringers, highlighting the importance of a proactive approach to monitoring and enforcement as the landscape continues to develop.

References

[1] https://www.iamstobbs.com/insights/free-hot-spot-an-exploration-of-three-new-gtld-launches

[2] https://iptwins.com/2025/06/19/new-gtlds-med-talk-you-fast-set-to-launch-in-august-september/

[3] Neglecting any expletives, or words which seem to provide no potential for a phrase making grammatical sense

This article was first published on 24 July 2025 at:

https://www.iamstobbs.com/insights/you-talk-fast-med-another-forthcoming-batch-of-new-gtlds

Thursday, 10 July 2025

(Literally) Everything's £1 – The Poundland domain landscape

With the news that UK-based discount retailer Poundland has been sold to US investment company Gordon Brothers for a 'nominal sum' of less than its eponymous £1, amid 'challenging trading conditions'[1,2,3], we take a look at the domain-name landscape for the brand, following similar analyses for other previous troubled companies[4,5,6,7,8].

Consideration of the set of registered brand-specific domains is of key importance for any incumbent or incoming brand owner, for a number of reasons. Primary considerations might typically include assessing whether there is enough strength in the set of defensive registrations, determining if the portfolio could be downsized by lapsing low-priority and/or high-cost and obscure domain names in order to save on renewal costs, and assessing whether web traffic is optimised by ensuring that all inactive domains re-direct to the official transactional website[9].

Brand owners should generally also analyse the landscape of third-party domains for any indications of fraud, brand infringement or traffic misdirection. This type of consideration can be particularly pertinent at time of high-profile news stories - such as this particular development with Poundland - when bad actors are often all too keen to take advantage of heightened public interest to launch their own scams associated with the brand.

In the case of Poundland, analysis of domain zone-file data[10] showed that, as of 13-Jun-2025 (i.e. one day after the break of the news story), there were 120 registered domains with names containing the brand. Whilst this is a relatively modestly-sized landscape, it certainly still warrants a deeper dive to determine any associated trends and patterns.

Whilst the company's official primary domain (poundland.co.uk) has limited available registrant information (as is usual for .co.uk domains due to data redaction following the introduction of GDPR), it is possible to identify other associated characteristics, such as registrar and MX record hosting provider, to confirm its official status. These details can then be cross-referenced to identify other official domains in the portfolio. Additionally, the associated (also official) .com domain (poundland.com), which can be seen to re-direct to the .co.uk version of the site, does have a somewhat richer associated dataset.

On this basis, at least 46 of the 120 brand-specific domains can be seen definitively to be under Poundland's official ownership. Only 11 of these display official content, with the remainder found to be non-resolving, displaying error pages or blank pages, leaving some room for further portfolio configuration optimisation.

This leaves 74 potential third-party domains to be assessed for potential threats. Of these, 32 produce some sort of live website response, and 40 are configured with active MX (mail exchange) records, indicating the ability to send and receive e-mails - providing a potential risk of phishing activity and/or other types of brand impersonation from these domains.

Amongst the domains resolving to live content, a range of examples hosting various types of content of potential concern were identified. Some examples, all of which are worthy of consideration for enforcement action, are shown in Figure 1.

Figure 1: Examples of websites featuring content of potential concern, associated with Poundland-specific domain names:

  • (a) e-commerce and utilisation of official branding (lovepoundland[.]store and moban-poundland[.]site)
  • (b) e-commerce and use of same brand name (poundlandshop[.]store)
  • (c) e-commerce - re-direction to external third-party sites (i. poundlandfabric[.]com - re-directs to poundametre[.]com; ii. onlinepoundland[.]co[.]uk and onlinepoundland[.]com - both re-direct to mxwholesale[.]co[.]uk)  
  • (d) misdirection to third-party content (poundlandol[.]shop)
  • log-in page with official branding (poundlandreporting[.]co[].uk) (possibly official)

Other examples include pages displaying pay-per-click (PPC) links, or domains being offered for sale, highlighting the intention of taking advantage of the renown of the brand to monetise the web traffic being driven to the sites in question. 

Some additional data 'clusters' are also apparent, such as a batch of three .shop (dot-shop) names referencing 'poundlandheart', all registered with privacy-protected whois records, through the same registrar on the same day. 

One further domain (poundlandharlow[.]com) was found to be registered simply to 'Poundland' (rather than the more usual 'Poundand Limited' used for other official domains - and with a non-official registrar), and may represent a 'semi-official domain', perhaps registered by an individual store franchisee, highlighting also some requirement for portfolio consolidation - an additional point which the new brand owners would also be well advised to address.

Given the range of relevant findings from just a small pool of domains for Poundland, we strongly also recommend other brand owners to remain vigilant in their brand protection endeavours. In the eyes of infringers, any brand-related news is good news, as it generally results in increased levels of public interest and volumes of search traffic. At such times, bad actors will find opportunities to take full advantage, and brand owners will generally find that, in those moments, good preparation and a robust brand protection strategy will pay off.

References

[1] https://www.ft.com/content/31c6338d-74c8-4c71-ad20-337beade4c71

[2] https://www.theguardian.com/business/2025/jun/12/poundland-sold-for-1-with-dozens-of-store-closures-expected

[3] https://www.bbc.co.uk/news/articles/c36594lr29ko

[4] https://www.iamstobbs.com/opinion/wilko-a-target-for-scams-following-administration

[5] https://www.iamstobbs.com/opinion/high-steaks-game-hawksmoors-ipo-and-its-domains

[6] https://www.iamstobbs.com/opinion/ip-and-digital-due-diligence-constructing-a-domain-policy-that-matches-brand-owner-requirements

[7] https://www.iamstobbs.com/opinion/no-party-ip-associated-with-the-fallen-tupperware-brand

[8] https://www.iamstobbs.com/insights/alas-smiths-an-exploration-of-wh-smiths-domains-following-their-store-closures

[9] https://www.iamstobbs.com/opinion/strategies-for-constructing-a-domain-name-registration-and-management-policy

[10] The analysis includes direct interrogation of raw domain-name zone files where available, generally thereby giving comprehensive coverage across all gTLDs, and is augmented by the use of additional datasets for ccTLD results, to gain maximum possible (though not completely comprehensive) coverage in these cases.

This article was first published on 10 July 2025 at:

https://www.iamstobbs.com/insights/literally-everythings-ps1-the-poundland-domain-landscape

Thursday, 3 July 2025

Exploring a domain scoring system with 'tricky' brands

by David Barnett and Frankie Cheung

EXECUTIVE SUMMARY

A very significant objective in brand monitoring applications is the ability to be able to rank findings in order of importance, or potential threat level, with a view to identifying priority targets for further analysis, content tracking, or enforcement . This can particularly be important in the case of monitoring for domains containing brand names which may be short or common words in their own right, and/or which frequently appear as sub-strings of other unrelated terms.

Our new study illustrates how a relatively simple 'domain risk scoring' approach, analysing just the domain name itself and incorporating 'weightings' dependent on the context within the domain name where the brand reference appears, and the presence of relevant and non-relevance keywords, can be used to effectively rank domains identified through broad searches. In extensions to this idea, it would be possible to extend the scoring formulation to take account of other inherent characteristics of the domain, such as TLD, MX record, or registrant, registrar or hosting-provider characteristics.

Furthermore, by combining this domain risk scoring approach with a 'content risk score' formulation, comprising an analysis of the content of any associated webpage, it is possible to carry out a deeper dive into the set of ranked results, to identify live content of potential interest, to serve as priority targets for further analysis, content tracking, or enforcement.

This article was first published on 3 July 2025 at:

https://www.iamstobbs.com/insights/exploring-a-domain-scoring-system-with-tricky-brands

* * * * *

WHITE PAPER

Introduction

A very significant objective in brand monitoring applications is the ability to be able to rank findings in order of importance, or potential threat level, with a view to identifying priority targets for further analysis, content tracking, or enforcement[1]. This can particularly be important in the case of monitoring for domains containing brand names which may be short or common words in their own right, and/or which frequently appear as sub-strings of other unrelated terms. A requirement for effective prioritisation arises from the fact that, for these types of 'tricky' (from a monitoring point of view) brand names, searches often generate large numbers of results - many of which are non-related 'false positives' - and it is often difficult to be able to find the results of interest amongst the 'noise'.

For domain monitoring specifically, it is generally necessary to be able to apply an effective filtering and sorting approach even in the absence of any live site content - so as to be able to identify examples which may be 'weaponised' at a later date, which may be in use for other purposes such as for their e-mail functionality, or which may be candidates for acquisition or dispute. In these cases, the analysis therefore needs to take account of inherent features of the domain name itself, rather than necessarily considering the content of any associated webpage.

In this paper, we consider the cases of the following selection of short/common brand names (sometimes referred to as 'generic' terms - though not in the trademark-related sense of the word) (all of which use the .com domain featuring an exact match to their brand name as their primary website domain), taken from the list of  top-50 most valuable brands in 2024, as provided by Interbrand[2]:

  • Apple (#1, brand value: $488.9B)
  • IBM (#19, brand value: $37.3B)
  • SAP (#20, brand value: $36.8B)
  • Visa (#32, brand value: $21.1B)
  • UPS (#35, brand value: $20.0B)
  • Intel (#37, brand value: $19.7B)
  • GE ('General Electric') (#47, brand value: $17.1B)
  • AXA (#48, brand value: $16.8B)

For simplicity, the study is based (just) on searches for gTLD (i.e. generic top-level domains, such as .com, .net, etc.) domains containing the brand names of interest, for which comprehensive datasets are available through the analysis of domain-name zone files. 

Analysis

The scale of the landscape

Table 1 shows the total raw numbers of domain results returned in response to a search for each of the brand names in question.

Brand-name
string
                              
No. gTLD
domains
                              
apple 84,556
ibm 25,812
sap 298,759
visa 81,433
ups 202,648
intel 144,323
ge 10,174,156
axa 71,306

Table 1: Numbers of gTLD domains containing the names of each of the brands under consideration

Shown below, for each of the brands, is a sample of the domains returned in the raw data (actually each 5,000th, 10,000th, 25,000th, 50,000th or 1,000,000th result - depending on the numbers of results returned - when sorted into alphabetical order). These examples are intended to give an indication of the types of results picked up the searches, the extent to which the vast majority of these names reference the brand name in an unrelated context, and the corresponding importance of employing an effective filtering and scoring process to prioritise the results and identify the significant findings.

apple:

  • 0000apple[.]com
  • apple-company[.]com
  • applelens[.]app
  • appleshears[.]com
  • applewaysuzuki[.]com
  • dapplevalleyfarm[.]com
  • kappler[.]group
  • pineapplepods[.]com
  • thehalfeatenapplecompany[.]com

ibm:

  • 001lisn9itt6q5db7uc3ibms2273h9ha[.]shop
  • aribm78ifopp3r5k0k9ffk3dt5v241v9[.]org
  • hibmw[.]com
  • ibmtivoli[.]com
  • om13g2l2rlg8ibmsvf82hcj2coiu8pco[.]com
  • vetoj10th2ibmcu9j2kr774uo89kk7l8[.]store

sap:

  • 000webhosapp[.]com
  • chapaexpresstrainsapa[.]com
  • hesapliarsa[.]online
  • myhsapps[.]com
  • sapia-ai[.]com
  • supersapphirewins[.]com

visa:

  • 007ukvisas[.]com
  • childvisas[.]com
  • expeditevisavietnam[.]org
  • invisalign-nuernberg[.]info
  • nohasslevisaonline[.]com
  • swedenvisa-palestinianterritory[.]com
  • visabahis717[.]com
  • visamastersindia[.]com
  • winwinvisa[.]com

ups:

  • 003oijaviqr4a39nubups221f8nav1lr[.]com
  • funeralstartups[.]com
  • p707nllm9pg5igjdf2h1rh581ups0d7p[.]net
  • tmallups[.]com
  • www-trackingshipment-ups[.]com

intel:

  • 007intel[.]com
  • customsintel[.]com
  • intelibud[.]com
  • intelligentbusinessoperations[.]com
  • intelspect[.]com
  • saintelizabethcalgary[.]com

ge (due to the size of the dataset, showing only examples from the set of .com results, for simplicity):

  • 0-100agency[.]com
  • brridgewaybentech[.]com
  • eventgeneratorsandcooling[.]com
  • getgeniusmindai[.]com
  • klargehtdas[.]com
  • numberonepage[.]com
  • significantsurgery[.]com
  • vo44digms6age13m2nob75e8743cldqr[.]com

axa:

  • 00axax[.]com
  • axarn[.]com
  • energietaxatie[.]com
  • laxallstars[.]net
  • mydaxa[.]com
  • relaxationexpert[.]com
  • taxandglobal[.]com
  • xaxasp10[.]xyz

Domain scoring

In order to filter and prioritise the results, we propose as a first step the use of a 'domain risk score', based just on characteristics of the domain name itself, and intended to provide a measure of the degree of relevance of the brand name in question. Note that, in more comprehensive scoring systems, it may be appropriate to consider additional domain features which can provide an overall indication of the potential level of risk, such as the TLD (top-level domain, or domain-name extension), presence of any MX (mail exchange) record, or registrant, registrar or hosting-provider characteristics, but these are not considered in this study.

The proposed basic algorithm incorporates a number of components to the final calculated domain risk score, as follows:

  • A weighting dependent on where, within the domain name, the brand reference appears, from the following options (from greatest to least significance):
    • Instances where the SLD (the second-level domain name, or the part of the name to the left of the dot) consists of the brand name only
    • Instances where the brand name appears at the start of the domain name
    • Instances where the brand name appears at the end of the domain name
    • Instances where the brand name appears elsewhere within the domain name
  • A greater weighting for instances where the brand reference is 'hyphen-separated' from the rest of the domain name (e.g. apple-abc.com would be deemed to be more brand-relevant than appleabc.com, as there is less scope for confusion with cases where the brand name can appear as a sub-string of other terms)
  • An optional greater weighting for domain names containing a more highly-distinctive variant of the basic brand name
  • Additional score increments for each reference to any of a pre-determined set of 'relevance keywords' (which can relate to the industry area of the brand in question, or to specific issue types of interest - e.g. phishing-related keywords) (i.e. 'positive filtering'); these keywords can also be assigned into 'tiers', with higher-relevance keywords being assigned larger scores 
  • A negative score increment for any reference to a known non-relevant 'false positive' (e.g. for 'axa', we may choose to explicitly downweight any domain containing the term 'relaxation') (i.e. 'negative filtering')
  • An additional score component reflecting the proportion of the domain name (in terms of the number of characters) consisting only of the brand name or any of the relevant keywords (or numerical digits, which are also disregarded), with the rationale being that a domain is more likely to be interesting if it consists only of the brand name plus relevant keywords)

Examples of these sorts of keywords (and as also used in the analysis which follows) are shown in Table 2. 

Brand
name
                              
Relevance keywords
('tier 1')
                                    
Relevance keywords
('tier 2')
                                    
Known
'false positives'
                                    
  apple iphone, ipad, airpod,
mac, watch, vision

shop, store, login,
verif, secur, auth
grapple, pineapple
  ibm business, cloud, storage, analy,
network, secur, software

     
  sap business, cloud, tech, software, enterprise, system, data

   sapien, sapporo, whatsapp
  visa credit, payment, contactless,
commerc
login, verif, secur,
auth
immigrat, travel, citizen,
asylum, passport, student,
invisalign, envisage, televisa,
visable

  ups deliver, track, ship, logistic,
courier, parcel, packag
login, verif, secur,
auth
pop(-)ups, start(-)ups, catch(-)ups,
check(-)ups, grown(-)ups,
hook(-)ups, set(-)ups,
touch(-)ups, clean(-)ups,
upscale, upside, upstate,
upshot, upsanddowns,
groups

  intel core, xeon, business, process,
system, device, driver, network,
software

   intelligen, inteligen, intellect
  ge general(-)electric, aerospace,
healthcare, vernova, tech

     
  axa insur, quot, claim, business,
health, multicar, breakdown,
bank, banq, fund, financ
login, verif, secur,
auth
relaxation, taxation, taxadv,
taxacc, laxative

Table 2: Groups of keywords used in the scoring algorithm for each of the brands

Following the analysis, the top-scored (i.e. potentially most relevant) domains for each of the brands are shown in Tables 3 a - h (excluding, for the purposes of illustration, any examples where the SLD is an exact match to the domain name, as these are anyway easily identified and will always be worthy of review). Please note also that, in a live service, any domains under official ownership would likely be excluded on the basis of the use of a whitelist or analysis of registrant / registrar information (not carried out in this study).

Domain name
                                                                                                
Domain risk score
                                
  applemacipadipodstore[.]com 637
  applemacipodipadstore[.]com 637
  apple-iphone-ipad-ipod[.]com 611
  apple-store-iphone[.]com 603
  apple-watch-store[.]com 601
  apple-watch-store[.]online 601
  apple-ipad-shop[.]com 598
  appleiphoneipad[.]com 575
  apple-macbook-shop[.]com 558
  apple-loginsecure[.]com 551

Table 3a: Top ten results by domain risk score for 'apple'

Domain name
                                                                                                
Domain risk score
                                
  ibm-business-analytics[.]com 620
  cloudsecurity-ibm[.]com 603
  ibmbusinesscloud[.]com 575
  ibmcloudsoftware[.]com 575
  ibmcloudstorage[.]com 575
  ibmcloudsecurity[.]com 538
  ibmsmartbusinesscloud[.]biz 527
  ibmsmartbusinesscloud[.]com 527
  ibmsmartbusinesscloud[.]info 527
  ibmsmartbusinesscloud[.]net 527
  ibmsmartbusinesscloud[.]org 527

Table 3b: Top ten results by domain risk score for 'ibm'

Domain name
                                                                                                
Domain risk score
                                
  business-data-cloud-sap[.]com 774
  sapbusinessdatacloud[.]com 725
  sapbusiness1cloud[.]com 575
  sapbusinesscloud[.]com 575
  sapenterprisecloud[.]com 575
  sapbusinessonesoftware[.]com 548
  sapbusinessonesoftware[.]info 548
  sapbusinessonesoftware[.]net 548
  sapbusinessonesoftware[.]org 548
  sapbusinessonecloud[.]com 543
  sapbusinessonecloud[.]net 543

Table 3c: Top ten results by domain risk score for 'sap'

Domain name
                                                                                                
Domain risk score
                                
  visasecurepayment[.]com 513
  visa-payment[.]com 508
  visa-credit[.]com 507
  visa-credit[.]net 507
  visa-credits[.]com 492
  unsecured-visa-credit-cards[.]net 486
  payment-visa[.]com 483
  securvisapayment[.]com 475
  unsecured-visa-credit-card-applications[.]com 452
  visa-secure[.]com 439
  visa-secure[.]net 439
  visa-verify[.]com 439

Table 3d: Top ten results by domain risk score for 'visa'

Domain name
                                                                                                
Domain risk score
                                
  track-package-rescheduled-delivery-ups[.]com 711
  ups-parceltrack[.]org 662
  ups-delivery-parcel[.]com 643
  ups-packagedelivery[.]com 643
  ups-parceltracking[.]com 631
  deliveryparcel-ups[.]com 628
  trackpackage-ups[.]com 625
  ups-deliverytrack-mt[.]com 625
  ups-parcell-tracker[.]com 622
  ups-parcel-tracking[.]com 622

Table 3e: Top ten results by domain risk score for 'ups'

Domain name
                                                                                                
Domain risk score
                                
  intelsoftwarenetwork[.]com 575
  intellcoresystems[.]com 551
  intellicore-network[.]info 543
  intellicorenetworks[.]com 543
  intellicoresystems[.]com 542
  intel-business[.]com 511
  intel-software[.]com 511
  intel-network[.]com 510
  intel-system[.]com 508
  intel-core[.]com 505
  intel-core[.]net 505
  intel-core[.]vip 505

Table 3f: Top ten results by domain risk score for 'intel'

Domain name
                                                                                                
Domain risk score
                                
  ge-healthcaretech[.]com 663
  ge-healthcaretechinc[.]com 635
  ge-healthcaretechnology[.]com 614
  ge-healthcaretechnologies[.]com 603
  ge-healthcaretechnologiesinc[.]net 589
  gehealthcaretech[.]com 575
  gentechhealthcare[.]com 563
  gehealthcaretechinc[.]com 543
  geltechealthcare[.]com 525
  gentechealthcare[.]com 525

Table 3g: Top ten results by domain risk score for 'ge' (noting that only one example of a result for each unique SLD is shown, due to the large numbers of repeated SLDs in the overall dataset)

Domain name
                                                                                                
Domain risk score
                                
  axa-banque-finance[.]com 619
  axa-health-insurance-slovakia[.]online 572
  axafinancebank[.]com 561
  axafinancialbank[.]com 538
  axainsurancebreakdown[.]com 537
  axabusinessinsurance[.]biz 535
  axabusinessinsurance[.]com 535
  axabusinessinsurance[.]info 535
  axabusinessinsurance[.]mobi 535
  axabusinessinsurance[.]net 535

Table 3h: Top ten results by domain risk score for 'axa'

The examples show that the algorithm performs well in terms of separating out the relevant examples from the large numbers of other results in the datasets.

Extensions to the approach

i. Use of domain name (SLD) entropy

In some cases, particularly for the shortest brand names, the dastasets may include instances of long, pseudo-random domain names (such as several of the examples shown above for 'ibm'). These types of domains are often associated with automated registrations intended for fraudulent use[3], but will not, in general, be associated with the brand whose name may be contained within them, and should ideally be disregarded (or downweighted) in the types of scoring algorithms described in this paper. 

However, the analysis shows that the basic scoring algorithm outlined in this study often does not effectively distinguish between domain names of this type and other 'better' brand matches (i.e. more relevant results). For example, for 'ibm', all of the following examples are assigned a domain risk score of 125:

  • i03204i8ua9n7sle6sdrm81mri0cibm9[.]net
  • i0f29td98etcibm9gkc29v4v9j39p5qm[.]top
  • i0lf99g2t8u92p7ibmlj4tvav849jp1n[.]tel
  • i216r5835dfoush9k1iibm4vpd669dka[.]top
  • i2ai773hvhan7l9001its1r8ibm84cav[.]site
  • i5u5127lfb56iibmj4bfa4c0m03mjt4f[.]motorcycles
  • i66t9t7vau8of667ibmlho120ab32bbv[.]online
  • i6crr5n3uqmmsmm5it7874uj099ibm87[.]com
  • i6t27emh03o11cfm6oa0r73l2ibmeki4[.]com
  • i76kcibmcu3310epn6lagpp292ivj114[.]top
  • i967pv1vn4outp103ibm7673diirjp3c[.]top
  • i98fmfibmcnjnbg2999s402pgem2258s[.]top
  • ia0n95j263iibmvue4s8v6lhll753a7s[.]com
  • ibmclassroom[.]com
  • ibmclienteng[.]com
  • ibmcognitive[.]com
  • ibmcognitive[.]org
  • ibmcomputers[.]asia
  • ibmcomputers[.]com
  • ibmcomputing[.]com
  • ibmcomputing[.]info
  • ibmconfigure[.]com
  • ibmcontracts[.]com
  • ibmcorporate[.]com

This mix of result types is due to the wide range of factors contributing to the final overall calculated score, including the fact that many of the long, random domain names consist of large numbers of digits, meaning that once these are disregarded, the 'ibm' string accounts for a significant proportion of the remainder of the domain name.

One possible way to account for the differences between these types of domain name would be to make use of the concept of domain name (SLD) entropy; essentially, a measure of the length and randomness of the domain name. The categorisation can be achieved by applying a 'correction' to the calculated domain risk score, by reducing it by a factor which is dependent on the domain name entropy (and, in the proposed methodology, applying this only to domains with entropy values above a certain threshold, since some of the visually-relevant domain names are found have 'mid-range' entropy values).

As a case study, we can consider the dataset of 1,504 'ibm' domains in total which are assigned a (raw) domain risk score of 125. The entropy values of these domains sit in a range between 1.4591 (mibmim[.]com) and 4.6350 (fhibmd96pt2or8745a2cltjj1gu4373e[.]com), with (by inspection) most of the 'random' domain names found to have entropy values above around 3.5 (which can be termed the entropy 'threshold', Hth). As such, a suitable reduction factor (R) for the domain risk score can be defined in terms of the domain entropy (H) as:

         R = exp(H) / exp(Hth)     (for HHth)
         R = 1     (otherwise)

such that the adjusted final domain risk score (Dadj) can be defined in terms of the 'raw' score (D) as:

         Dadj = D / R

The form of this reduction factor function is as shown in Figure 1. 

Figure 1: One possible formulation of a domain risk score reduction factor (R) to be used to 'down-score' high entropy (H) domains

This correction results in a 'down-scoring' of 642 of the 1,504 domains. As an illustration, Table 4 gives a selection of those domains whose final scores have been reduced as a result of the entropy-based correction (actually alphabetically the first domain assigned to each adjusted score value), showing that the correction does, as intended, preferentially affect the 'random' domain names.

Domain name
                                                                                                
Adjusted domain
risk score (Dadj)
                                
  slmibm8epk1u84[.]com 122
  ibmpower4saphana[.]com 116
  ibmathsworld[.]com 115
  4659sib4645muss5msgf5buribm8e1u6[.]top 103
  shibmaro323429fjcnrin43rncnr43rvnfuiru448484848484[.]com 94
  97bj94io2ibm42fppgqi7n274f73fsji[.]how 92
  647d75i7co7mj7b0l7vmmqr4ibmd06qu[.]net 90
  kidmi5b71tibm7b0ff560iuq1c5ir477[.]pro 89
  ibmknaj5mcimebc3iaqchinml5l3h6ve[.]top 88
  413b3ibmlu6n9iq4qa4441cancjm96ap[.]com 87
  br74cgrf32bbsgr3rsc7s6ofs94nqibm[.]com 86
  v0q7bbtnb0atnqj68l0au0age1a7bibm[.]com 85

Table 4: Examples of domains whose risk scores have been reduced by the entropy-based correction factor

ii. Content risk scoring

As an extension to the above ideas, it is also possible to calculate a second score, based on an analysis of the content of any associated webpage (if present), as an alternative or secondary means of sorting the results (working on the basis that, other factors being equal, a domain will be of greater concern if it is associated with live, brand-related content). 

To this end, we can formulate a 'content risk score', which itself is composed of two constituent components:

  • A 'brand content score' , reflecting the number and prominence of mentions of the brand name on the page
  • An additional metric reflecting the numbers of unique relevance keywords mentioned at least once anywhere in the page content (to take account of the fact that, for common / 'generic' brand terms, the brand name could be mentioned in contexts unrelated to the brand in question, but the presence of relevance keywords will indicate that the subject matter of the page is relevant to the brand in question). 

As an illustration, we can calculate the content risk scores for sets of the domains assigned the highest domain risk scores for each of the brands in question, as a means of identifying live content of interest (e.g. potential infringements). 

As an example, Table 5 shows the website details for the examples achieving the highest content risk scores (i.e. potentially the most relevant websites) out of a set of those results for 'apple' which themselves receive the highest domain risk scores (>300) (i.e. potentially the most relevant domain names).

Domain name
                                                     
Domain
risk score
                        
Website page title
                                                                
Content
risk score
                        
  applewatchjournal.net 343 Apple Watch Journal - Apple Watch
(アップルウォッチ)の総合情報サイト。
Apple Watchの基本的な使い方やWatch
アプリの情報、最新ニュースを紹介します!
4,640
  applelivingstore.com 300 Apple Living Store – Vente des iphones neufs
et occasions
4,150
  appleministore.com 318 Shop the Latest Apple Products iPhones;
MacBooks; iPads & More
4,020
  applewatchcast.com 368 The Apple WatchCast Podcast - A podcast
dedicated to the Apple Watch
2,700
  applewatchrepairz.com 343 Get Professional Apple Watch Repair Services
 | Fast & Affordable
2,300
  apple-mac.support 503 Apple Spezialist im Rheinland | Mac Support
für Kunden in Köln, Bonn, Düsseldorf und
 Aachen | KLEUTGENS.IT
2,226
  apple.watch 500 Apple Watch - Apple 2,150
  apple-wholesale-stores.com 366 Apple Wholesale Store - Buy Apple Products
at the Best Price
2,145

Table 5: Website details for the examples achieving the highest content risk scores for Apple

On this basis, Figure 2 shows one example of an identified live website of interest (i.e. brand-related content / potential brand infringement) for each of the brands under consideration.

Figure 2: Examples of an identified live website of interest for each of the brands under consideration: apple-wholesale-stores[.]com, ibmisecurity[.]com, sap-system[.]com, paymentvisanet[.]com, ups17track[.]com, intel-processor[.]com, gevernovatechtraining[.]com, axainsurancebali[.]com

Conclusion

The studies presented in this paper have illustrated how a relatively simple 'domain risk scoring' approach can be used to effectively rank domains identified through broad searches, so as to identify names of particular interest, even in cases where the brand name used as the basis of the search may be a very short or common term.

In extensions to this idea, it would be possible to extend the scoring formulation to take account of other inherent characteristics of the domain, such as TLD, MX record, or registrant, registrar or hosting-provider characteristics, many of which can themselves be assigned into 'tiers' of potential threat level, and scored accordingly.

Finally, by combining this domain risk scoring approach with a 'content risk score' formulation, it is possible to carry out a deeper dive into the set of ranked results, to identify live content of potential interest, to serve as priority targets for further analysis, content tracking, or enforcement.

References

[1] https://circleid.com/posts/towards-a-generalised-threat-scoring-framework-for-prioritising-results-from-brand-monitoring-programmes

[2] https://interbrand.com/best-global-brands/

[3] https://circleid.com/posts/20230703-an-overview-of-the-concept-and-use-of-domain-name-entropy

This article was first published as a white paper on 3 July 2025 at:

https://www.iamstobbs.com/uploads/general/Exploring-a-domain-scoring-system-with-tricky-brands-e-book.pdf


E-mail address extraction from webpages: a quick case study in result 'clustering'

Introduction The concept of result 'clustering' - that is, the ability to establish connections between online brand monitoring find...