Thursday, 10 July 2025

(Literally) Everything's £1 – The Poundland domain landscape

With the news that UK-based discount retailer Poundland has been sold to US investment company Gordon Brothers for a 'nominal sum' of less than its eponymous £1, amid 'challenging trading conditions'[1,2,3], we take a look at the domain-name landscape for the brand, following similar analyses for other previous troubled companies[4,5,6,7,8].

Consideration of the set of registered brand-specific domains is of key importance for any incumbent or incoming brand owner, for a number of reasons. Primary considerations might typically include assessing whether there is enough strength in the set of defensive registrations, determining if the portfolio could be downsized by lapsing low-priority and/or high-cost and obscure domain names in order to save on renewal costs, and assessing whether web traffic is optimised by ensuring that all inactive domains re-direct to the official transactional website[9].

Brand owners should generally also analyse the landscape of third-party domains for any indications of fraud, brand infringement or traffic misdirection. This type of consideration can be particularly pertinent at time of high-profile news stories - such as this particular development with Poundland - when bad actors are often all too keen to take advantage of heightened public interest to launch their own scams associated with the brand.

In the case of Poundland, analysis of domain zone-file data[10] showed that, as of 13-Jun-2025 (i.e. one day after the break of the news story), there were 120 registered domains with names containing the brand. Whilst this is a relatively modestly-sized landscape, it certainly still warrants a deeper dive to determine any associated trends and patterns.

Whilst the company's official primary domain (poundland.co.uk) has limited available registrant information (as is usual for .co.uk domains due to data redaction following the introduction of GDPR), it is possible to identify other associated characteristics, such as registrar and MX record hosting provider, to confirm its official status. These details can then be cross-referenced to identify other official domains in the portfolio. Additionally, the associated (also official) .com domain (poundland.com), which can be seen to re-direct to the .co.uk version of the site, does have a somewhat richer associated dataset.

On this basis, at least 46 of the 120 brand-specific domains can be seen definitively to be under Poundland's official ownership. Only 11 of these display official content, with the remainder found to be non-resolving, displaying error pages or blank pages, leaving some room for further portfolio configuration optimisation.

This leaves 74 potential third-party domains to be assessed for potential threats. Of these, 32 produce some sort of live website response, and 40 are configured with active MX (mail exchange) records, indicating the ability to send and receive e-mails - providing a potential risk of phishing activity and/or other types of brand impersonation from these domains.

Amongst the domains resolving to live content, a range of examples hosting various types of content of potential concern were identified. Some examples, all of which are worthy of consideration for enforcement action, are shown in Figure 1.

Figure 1: Examples of websites featuring content of potential concern, associated with Poundland-specific domain names:

  • (a) e-commerce and utilisation of official branding (lovepoundland[.]store and moban-poundland[.]site)
  • (b) e-commerce and use of same brand name (poundlandshop[.]store)
  • (c) e-commerce - re-direction to external third-party sites (i. poundlandfabric[.]com - re-directs to poundametre[.]com; ii. onlinepoundland[.]co[.]uk and onlinepoundland[.]com - both re-direct to mxwholesale[.]co[.]uk)  
  • (d) misdirection to third-party content (poundlandol[.]shop)
  • log-in page with official branding (poundlandreporting[.]co[].uk) (possibly official)

Other examples include pages displaying pay-per-click (PPC) links, or domains being offered for sale, highlighting the intention of taking advantage of the renown of the brand to monetise the web traffic being driven to the sites in question. 

Some additional data 'clusters' are also apparent, such as a batch of three .shop (dot-shop) names referencing 'poundlandheart', all registered with privacy-protected whois records, through the same registrar on the same day. 

One further domain (poundlandharlow[.]com) was found to be registered simply to 'Poundland' (rather than the more usual 'Poundand Limited' used for other official domains - and with a non-official registrar), and may represent a 'semi-official domain', perhaps registered by an individual store franchisee, highlighting also some requirement for portfolio consolidation - an additional point which the new brand owners would also be well advised to address.

Given the range of relevant findings from just a small pool of domains for Poundland, we strongly also recommend other brand owners to remain vigilant in their brand protection endeavours. In the eyes of infringers, any brand-related news is good news, as it generally results in increased levels of public interest and volumes of search traffic. At such times, bad actors will find opportunities to take full advantage, and brand owners will generally find that, in those moments, good preparation and a robust brand protection strategy will pay off.

References

[1] https://www.ft.com/content/31c6338d-74c8-4c71-ad20-337beade4c71

[2] https://www.theguardian.com/business/2025/jun/12/poundland-sold-for-1-with-dozens-of-store-closures-expected

[3] https://www.bbc.co.uk/news/articles/c36594lr29ko

[4] https://www.iamstobbs.com/opinion/wilko-a-target-for-scams-following-administration

[5] https://www.iamstobbs.com/opinion/high-steaks-game-hawksmoors-ipo-and-its-domains

[6] https://www.iamstobbs.com/opinion/ip-and-digital-due-diligence-constructing-a-domain-policy-that-matches-brand-owner-requirements

[7] https://www.iamstobbs.com/opinion/no-party-ip-associated-with-the-fallen-tupperware-brand

[8] https://www.iamstobbs.com/insights/alas-smiths-an-exploration-of-wh-smiths-domains-following-their-store-closures

[9] https://www.iamstobbs.com/opinion/strategies-for-constructing-a-domain-name-registration-and-management-policy

[10] The analysis includes direct interrogation of raw domain-name zone files where available, generally thereby giving comprehensive coverage across all gTLDs, and is augmented by the use of additional datasets for ccTLD results, to gain maximum possible (though not completely comprehensive) coverage in these cases.

This article was first published on 10 July 2024 at:

https://www.iamstobbs.com/insights/literally-everythings-ps1-the-poundland-domain-landscape

Thursday, 3 July 2025

Exploring a domain scoring system with 'tricky' brands

by David Barnett and Frankie Cheung

EXECUTIVE SUMMARY

A very significant objective in brand monitoring applications is the ability to be able to rank findings in order of importance, or potential threat level, with a view to identifying priority targets for further analysis, content tracking, or enforcement . This can particularly be important in the case of monitoring for domains containing brand names which may be short or common words in their own right, and/or which frequently appear as sub-strings of other unrelated terms.

Our new study illustrates how a relatively simple 'domain risk scoring' approach, analysing just the domain name itself and incorporating 'weightings' dependent on the context within the domain name where the brand reference appears, and the presence of relevant and non-relevance keywords, can be used to effectively rank domains identified through broad searches. In extensions to this idea, it would be possible to extend the scoring formulation to take account of other inherent characteristics of the domain, such as TLD, MX record, or registrant, registrar or hosting-provider characteristics.

Furthermore, by combining this domain risk scoring approach with a 'content risk score' formulation, comprising an analysis of the content of any associated webpage, it is possible to carry out a deeper dive into the set of ranked results, to identify live content of potential interest, to serve as priority targets for further analysis, content tracking, or enforcement.

This article was first published on 3 July 2024 at:

https://www.iamstobbs.com/insights/exploring-a-domain-scoring-system-with-tricky-brands

* * * * *

WHITE PAPER

Introduction

A very significant objective in brand monitoring applications is the ability to be able to rank findings in order of importance, or potential threat level, with a view to identifying priority targets for further analysis, content tracking, or enforcement[1]. This can particularly be important in the case of monitoring for domains containing brand names which may be short or common words in their own right, and/or which frequently appear as sub-strings of other unrelated terms. A requirement for effective prioritisation arises from the fact that, for these types of 'tricky' (from a monitoring point of view) brand names, searches often generate large numbers of results - many of which are non-related 'false positives' - and it is often difficult to be able to find the results of interest amongst the 'noise'.

For domain monitoring specifically, it is generally necessary to be able to apply an effective filtering and sorting approach even in the absence of any live site content - so as to be able to identify examples which may be 'weaponised' at a later date, which may be in use for other purposes such as for their e-mail functionality, or which may be candidates for acquisition or dispute. In these cases, the analysis therefore needs to take account of inherent features of the domain name itself, rather than necessarily considering the content of any associated webpage.

In this paper, we consider the cases of the following selection of short/common brand names (sometimes referred to as 'generic' terms - though not in the trademark-related sense of the word) (all of which use the .com domain featuring an exact match to their brand name as their primary website domain), taken from the list of  top-50 most valuable brands in 2024, as provided by Interbrand[2]:

  • Apple (#1, brand value: $488.9B)
  • IBM (#19, brand value: $37.3B)
  • SAP (#20, brand value: $36.8B)
  • Visa (#32, brand value: $21.1B)
  • UPS (#35, brand value: $20.0B)
  • Intel (#37, brand value: $19.7B)
  • GE ('General Electric') (#47, brand value: $17.1B)
  • AXA (#48, brand value: $16.8B)

For simplicity, the study is based (just) on searches for gTLD (i.e. generic top-level domains, such as .com, .net, etc.) domains containing the brand names of interest, for which comprehensive datasets are available through the analysis of domain-name zone files. 

Analysis

The scale of the landscape

Table 1 shows the total raw numbers of domain results returned in response to a search for each of the brand names in question.

Brand-name
string
                              
No. gTLD
domains
                              
apple 84,556
ibm 25,812
sap 298,759
visa 81,433
ups 202,648
intel 144,323
ge 10,174,156
axa 71,306

Table 1: Numbers of gTLD domains containing the names of each of the brands under consideration

Shown below, for each of the brands, is a sample of the domains returned in the raw data (actually each 5,000th, 10,000th, 25,000th, 50,000th or 1,000,000th result - depending on the numbers of results returned - when sorted into alphabetical order). These examples are intended to give an indication of the types of results picked up the searches, the extent to which the vast majority of these names reference the brand name in an unrelated context, and the corresponding importance of employing an effective filtering and scoring process to prioritise the results and identify the significant findings.

apple:

  • 0000apple[.]com
  • apple-company[.]com
  • applelens[.]app
  • appleshears[.]com
  • applewaysuzuki[.]com
  • dapplevalleyfarm[.]com
  • kappler[.]group
  • pineapplepods[.]com
  • thehalfeatenapplecompany[.]com

ibm:

  • 001lisn9itt6q5db7uc3ibms2273h9ha[.]shop
  • aribm78ifopp3r5k0k9ffk3dt5v241v9[.]org
  • hibmw[.]com
  • ibmtivoli[.]com
  • om13g2l2rlg8ibmsvf82hcj2coiu8pco[.]com
  • vetoj10th2ibmcu9j2kr774uo89kk7l8[.]store

sap:

  • 000webhosapp[.]com
  • chapaexpresstrainsapa[.]com
  • hesapliarsa[.]online
  • myhsapps[.]com
  • sapia-ai[.]com
  • supersapphirewins[.]com

visa:

  • 007ukvisas[.]com
  • childvisas[.]com
  • expeditevisavietnam[.]org
  • invisalign-nuernberg[.]info
  • nohasslevisaonline[.]com
  • swedenvisa-palestinianterritory[.]com
  • visabahis717[.]com
  • visamastersindia[.]com
  • winwinvisa[.]com

ups:

  • 003oijaviqr4a39nubups221f8nav1lr[.]com
  • funeralstartups[.]com
  • p707nllm9pg5igjdf2h1rh581ups0d7p[.]net
  • tmallups[.]com
  • www-trackingshipment-ups[.]com

intel:

  • 007intel[.]com
  • customsintel[.]com
  • intelibud[.]com
  • intelligentbusinessoperations[.]com
  • intelspect[.]com
  • saintelizabethcalgary[.]com

ge (due to the size of the dataset, showing only examples from the set of .com results, for simplicity):

  • 0-100agency[.]com
  • brridgewaybentech[.]com
  • eventgeneratorsandcooling[.]com
  • getgeniusmindai[.]com
  • klargehtdas[.]com
  • numberonepage[.]com
  • significantsurgery[.]com
  • vo44digms6age13m2nob75e8743cldqr[.]com

axa:

  • 00axax[.]com
  • axarn[.]com
  • energietaxatie[.]com
  • laxallstars[.]net
  • mydaxa[.]com
  • relaxationexpert[.]com
  • taxandglobal[.]com
  • xaxasp10[.]xyz

Domain scoring

In order to filter and prioritise the results, we propose as a first step the use of a 'domain risk score', based just on characteristics of the domain name itself, and intended to provide a measure of the degree of relevance of the brand name in question. Note that, in more comprehensive scoring systems, it may be appropriate to consider additional domain features which can provide an overall indication of the potential level of risk, such as the TLD (top-level domain, or domain-name extension), presence of any MX (mail exchange) record, or registrant, registrar or hosting-provider characteristics, but these are not considered in this study.

The proposed basic algorithm incorporates a number of components to the final calculated domain risk score, as follows:

  • A weighting dependent on where, within the domain name, the brand reference appears, from the following options (from greatest to least significance):
    • Instances where the SLD (the second-level domain name, or the part of the name to the left of the dot) consists of the brand name only
    • Instances where the brand name appears at the start of the domain name
    • Instances where the brand name appears at the end of the domain name
    • Instances where the brand name appears elsewhere within the domain name
  • A greater weighting for instances where the brand reference is 'hyphen-separated' from the rest of the domain name (e.g. apple-abc.com would be deemed to be more brand-relevant than appleabc.com, as there is less scope for confusion with cases where the brand name can appear as a sub-string of other terms)
  • An optional greater weighting for domain names containing a more highly-distinctive variant of the basic brand name
  • Additional score increments for each reference to any of a pre-determined set of 'relevance keywords' (which can relate to the industry area of the brand in question, or to specific issue types of interest - e.g. phishing-related keywords) (i.e. 'positive filtering'); these keywords can also be assigned into 'tiers', with higher-relevance keywords being assigned larger scores 
  • A negative score increment for any reference to a known non-relevant 'false positive' (e.g. for 'axa', we may choose to explicitly downweight any domain containing the term 'relaxation') (i.e. 'negative filtering')
  • An additional score component reflecting the proportion of the domain name (in terms of the number of characters) consisting only of the brand name or any of the relevant keywords (or numerical digits, which are also disregarded), with the rationale being that a domain is more likely to be interesting if it consists only of the brand name plus relevant keywords)

Examples of these sorts of keywords (and as also used in the analysis which follows) are shown in Table 2. 

Brand
name
                              
Relevance keywords
('tier 1')
                                    
Relevance keywords ('tier 2')
                                    
Known
'false positives'
                                    
  apple iphone, ipad, airpod,
mac, watch, vision

shop, store, login,
verif, secur, auth
grapple, pineapple
  ibm business, cloud, storage, analy,
network, secur, software

     
  sap business, cloud, tech, software, enterprise, system, data

   sapien, sapporo, whatsapp
  visa credit, payment, contactless,
commerc
login, verif, secur,
auth
immigrat, travel, citizen,
asylum, passport, student,
invisalign, envisage, televisa,
visable

  ups deliver, track, ship, logistic,
courier, parcel, packag
login, verif, secur,
auth
pop(-)ups, start(-)ups, catch(-)ups,
check(-)ups, grown(-)ups,
hook(-)ups, set(-)ups,
touch(-)ups, clean(-)ups,
upscale, upside, upstate,
upshot, upsanddowns,
groups

  intel core, xeon, business, process,
system, device, driver, network,
software

   intelligen, inteligen, intellect
  ge general(-)electric, aerospace,
healthcare, vernova, tech

     
  axa insur, quot, claim, business,
health, multicar, breakdown,
bank, banq, fund, financ
login, verif, secur,
auth
relaxation, taxation, taxadv,
taxacc, laxative

Table 2: Groups of keywords used in the scoring algorithm for each of the brands

Following the analysis, the top-scored (i.e. potentially most relevant) domains for each of the brands are shown in Tables 3 a - h (excluding, for the purposes of illustration, any examples where the SLD is an exact match to the domain name, as these are anyway easily identified and will always be worthy of review). Please note also that, in a live service, any domains under official ownership would likely be excluded on the basis of the use of a whitelist or analysis of registrant / registrar information (not carried out in this study).

Domain name
                                                                                                
Domain risk score
                                
  applemacipadipodstore[.]com 637
  applemacipodipadstore[.]com 637
  apple-iphone-ipad-ipod[.]com 611
  apple-store-iphone[.]com 603
  apple-watch-store[.]com 601
  apple-watch-store[.]online 601
  apple-ipad-shop[.]com 598
  appleiphoneipad[.]com 575
  apple-macbook-shop[.]com 558
  apple-loginsecure[.]com 551

Table 3a: Top ten results by domain risk score for 'apple'

Domain name
                                                                                                
Domain risk score
                                
  ibm-business-analytics[.]com 620
  cloudsecurity-ibm[.]com 603
  ibmbusinesscloud[.]com 575
  ibmcloudsoftware[.]com 575
  ibmcloudstorage[.]com 575
  ibmcloudsecurity[.]com 538
  ibmsmartbusinesscloud[.]biz 527
  ibmsmartbusinesscloud[.]com 527
  ibmsmartbusinesscloud[.]info 527
  ibmsmartbusinesscloud[.]net 527
  ibmsmartbusinesscloud[.]org 527

Table 3b: Top ten results by domain risk score for 'ibm'

Domain name
                                                                                                
Domain risk score
                                
  business-data-cloud-sap[.]com 774
  sapbusinessdatacloud[.]com 725
  sapbusiness1cloud[.]com 575
  sapbusinesscloud[.]com 575
  sapenterprisecloud[.]com 575
  sapbusinessonesoftware[.]com 548
  sapbusinessonesoftware[.]info 548
  sapbusinessonesoftware[.]net 548
  sapbusinessonesoftware[.]org 548
  sapbusinessonecloud[.]com 543
  sapbusinessonecloud[.]net 543

Table 3c: Top ten results by domain risk score for 'sap'

Domain name
                                                                                                
Domain risk score
                                
  visasecurepayment[.]com 513
  visa-payment[.]com 508
  visa-credit[.]com 507
  visa-credit[.]net 507
  visa-credits[.]com 492
  unsecured-visa-credit-cards[.]net 486
  payment-visa[.]com 483
  securvisapayment[.]com 475
  unsecured-visa-credit-card-applications[.]com 452
  visa-secure[.]com 439
  visa-secure[.]net 439
  visa-verify[.]com 439

Table 3d: Top ten results by domain risk score for 'visa'

Domain name
                                                                                                
Domain risk score
                                
  track-package-rescheduled-delivery-ups[.]com 711
  ups-parceltrack[.]org 662
  ups-delivery-parcel[.]com 643
  ups-packagedelivery[.]com 643
  ups-parceltracking[.]com 631
  deliveryparcel-ups[.]com 628
  trackpackage-ups[.]com 625
  ups-deliverytrack-mt[.]com 625
  ups-parcell-tracker[.]com 622
  ups-parcel-tracking[.]com 622

Table 3e: Top ten results by domain risk score for 'ups'

Domain name
                                                                                                
Domain risk score
                                
  intelsoftwarenetwork[.]com 575
  intellcoresystems[.]com 551
  intellicore-network[.]info 543
  intellicorenetworks[.]com 543
  intellicoresystems[.]com 542
  intel-business[.]com 511
  intel-software[.]com 511
  intel-network[.]com 510
  intel-system[.]com 508
  intel-core[.]com 505
  intel-core[.]net 505
  intel-core[.]vip 505

Table 3f: Top ten results by domain risk score for 'intel'

Domain name
                                                                                                
Domain risk score
                                
  ge-healthcaretech[.]com 663
  ge-healthcaretechinc[.]com 635
  ge-healthcaretechnology[.]com 614
  ge-healthcaretechnologies[.]com 603
  ge-healthcaretechnologiesinc[.]net 589
  gehealthcaretech[.]com 575
  gentechhealthcare[.]com 563
  gehealthcaretechinc[.]com 543
  geltechealthcare[.]com 525
  gentechealthcare[.]com 525

Table 3g: Top ten results by domain risk score for 'ge' (noting that only one example of a result for each unique SLD is shown, due to the large numbers of repeated SLDs in the overall dataset)

Domain name
                                                                                                
Domain risk score
                                
  axa-banque-finance[.]com 619
  axa-health-insurance-slovakia[.]online 572
  axafinancebank[.]com 561
  axafinancialbank[.]com 538
  axainsurancebreakdown[.]com 537
  axabusinessinsurance[.]biz 535
  axabusinessinsurance[.]com 535
  axabusinessinsurance[.]info 535
  axabusinessinsurance[.]mobi 535
  axabusinessinsurance[.]net 535

Table 3h: Top ten results by domain risk score for 'axa'

The examples show that the algorithm performs well in terms of separating out the relevant examples from the large numbers of other results in the datasets.

Extensions to the approach

i. Use of domain name (SLD) entropy

In some cases, particularly for the shortest brand names, the dastasets may include instances of long, pseudo-random domain names (such as several of the examples shown above for 'ibm'). These types of domains are often associated with automated registrations intended for fraudulent use[3], but will not, in general, be associated with the brand whose name may be contained within them, and should ideally be disregarded (or downweighted) in the types of scoring algorithms described in this paper. 

However, the analysis shows that the basic scoring algorithm outlined in this study often does not effectively distinguish between domain names of this type and other 'better' brand matches (i.e. more relevant results). For example, for 'ibm', all of the following examples are assigned a domain risk score of 125:

  • i03204i8ua9n7sle6sdrm81mri0cibm9[.]net
  • i0f29td98etcibm9gkc29v4v9j39p5qm[.]top
  • i0lf99g2t8u92p7ibmlj4tvav849jp1n[.]tel
  • i216r5835dfoush9k1iibm4vpd669dka[.]top
  • i2ai773hvhan7l9001its1r8ibm84cav[.]site
  • i5u5127lfb56iibmj4bfa4c0m03mjt4f[.]motorcycles
  • i66t9t7vau8of667ibmlho120ab32bbv[.]online
  • i6crr5n3uqmmsmm5it7874uj099ibm87[.]com
  • i6t27emh03o11cfm6oa0r73l2ibmeki4[.]com
  • i76kcibmcu3310epn6lagpp292ivj114[.]top
  • i967pv1vn4outp103ibm7673diirjp3c[.]top
  • i98fmfibmcnjnbg2999s402pgem2258s[.]top
  • ia0n95j263iibmvue4s8v6lhll753a7s[.]com
  • ibmclassroom[.]com
  • ibmclienteng[.]com
  • ibmcognitive[.]com
  • ibmcognitive[.]org
  • ibmcomputers[.]asia
  • ibmcomputers[.]com
  • ibmcomputing[.]com
  • ibmcomputing[.]info
  • ibmconfigure[.]com
  • ibmcontracts[.]com
  • ibmcorporate[.]com

This mix of result types is due to the wide range of factors contributing to the final overall calculated score, including the fact that many of the long, random domain names consist of large numbers of digits, meaning that once these are disregarded, the 'ibm' string accounts for a significant proportion of the remainder of the domain name.

One possible way to account for the differences between these types of domain name would be to make use of the concept of domain name (SLD) entropy; essentially, a measure of the length and randomness of the domain name. The categorisation can be achieved by applying a 'correction' to the calculated domain risk score, by reducing it by a factor which is dependent on the domain name entropy (and, in the proposed methodology, applying this only to domains with entropy values above a certain threshold, since some of the visually-relevant domain names are found have 'mid-range' entropy values).

As a case study, we can consider the dataset of 1,504 'ibm' domains in total which are assigned a (raw) domain risk score of 125. The entropy values of these domains sit in a range between 1.4591 (mibmim[.]com) and 4.6350 (fhibmd96pt2or8745a2cltjj1gu4373e[.]com), with (by inspection) most of the 'random' domain names found to have entropy values above around 3.5 (which can be termed the entropy 'threshold', Hth). As such, a suitable reduction factor (R) for the domain risk score can be defined in terms of the domain entropy (H) as:

         R = exp(H) / exp(Hth)     (for HHth)
         R = 1     (otherwise)

such that the adjusted final domain risk score (Dadj) can be defined in terms of the 'raw' score (D) as:

         Dadj = D / R

The form of this reduction factor function is as shown in Figure 1. 

Figure 1: One possible formulation of a domain risk score reduction factor (R) to be used to 'down-score' high entropy (H) domains

This correction results in a 'down-scoring' of 642 of the 1,504 domains. As an illustration, Table 4 gives a selection of those domains whose final scores have been reduced as a result of the entropy-based correction (actually alphabetically the first domain assigned to each adjusted score value), showing that the correction does, as intended, preferentially affect the 'random' domain names.

Domain name
                                                                                                
Adjusted domain
risk score (Dadj)
                                
  slmibm8epk1u84[.]com 122
  ibmpower4saphana[.]com 116
  ibmathsworld[.]com 115
  4659sib4645muss5msgf5buribm8e1u6[.]top 103
  shibmaro323429fjcnrin43rncnr43rvnfuiru448484848484[.]com 94
  97bj94io2ibm42fppgqi7n274f73fsji[.]how 92
  647d75i7co7mj7b0l7vmmqr4ibmd06qu[.]net 90
  kidmi5b71tibm7b0ff560iuq1c5ir477[.]pro 89
  ibmknaj5mcimebc3iaqchinml5l3h6ve[.]top 88
  413b3ibmlu6n9iq4qa4441cancjm96ap[.]com 87
  br74cgrf32bbsgr3rsc7s6ofs94nqibm[.]com 86
  v0q7bbtnb0atnqj68l0au0age1a7bibm[.]com 85

Table 4: Examples of domains whose risk scores have been reduced by the entropy-based correction factor

ii. Content risk scoring

As an extension to the above ideas, it is also possible to calculate a second score, based on an analysis of the content of any associated webpage (if present), as an alternative or secondary means of sorting the results (working on the basis that, other factors being equal, a domain will be of greater concern if it is associated with live, brand-related content). 

To this end, we can formulate a 'content risk score', which itself is composed of two constituent components:

  • A 'brand content score' , reflecting the number and prominence of mentions of the brand name on the page
  • An additional metric reflecting the numbers of unique relevance keywords mentioned at least once anywhere in the page content (to take account of the fact that, for common / 'generic' brand terms, the brand name could be mentioned in contexts unrelated to the brand in question, but the presence of relevance keywords will indicate that the subject matter of the page is relevant to the brand in question). 

As an illustration, we can calculate the content risk scores for sets of the domains assigned the highest domain risk scores for each of the brands in question, as a means of identifying live content of interest (e.g. potential infringements). 

As an example, Table 5 shows the website details for the examples achieving the highest content risk scores (i.e. potentially the most relevant websites) out of a set of those results for 'apple' which themselves receive the highest domain risk scores (>300) (i.e. potentially the most relevant domain names).

Domain name
                                                     
Domain
risk score
                        
Website page title
                                                                
Content
risk score
                        
  applewatchjournal.net 343 Apple Watch Journal - Apple Watch
(アップルウォッチ)の総合情報サイト。
Apple Watchの基本的な使い方やWatch
アプリの情報、最新ニュースを紹介します!
4,640
  applelivingstore.com 300 Apple Living Store – Vente des iphones neufs
et occasions
4,150
  appleministore.com 318 Shop the Latest Apple Products iPhones;
MacBooks; iPads & More
4,020
  applewatchcast.com 368 The Apple WatchCast Podcast - A podcast
dedicated to the Apple Watch
2,700
  applewatchrepairz.com 343 Get Professional Apple Watch Repair Services
 | Fast & Affordable
2,300
  apple-mac.support 503 Apple Spezialist im Rheinland | Mac Support
für Kunden in Köln, Bonn, Düsseldorf und
 Aachen | KLEUTGENS.IT
2,226
  apple.watch 500 Apple Watch - Apple 2,150
  apple-wholesale-stores.com 366 Apple Wholesale Store - Buy Apple Products
at the Best Price
2,145

Table 5: Website details for the examples achieving the highest content risk scores for Apple

On this basis, Figure 2 shows one example of an identified live website of interest (i.e. brand-related content / potential brand infringement) for each of the brands under consideration.

Figure 2: Examples of an identified live website of interest for each of the brands under consideration: apple-wholesale-stores[.]com, ibmisecurity[.]com, sap-system[.]com, paymentvisanet[.]com, ups17track[.]com, intel-processor[.]com, gevernovatechtraining[.]com, axainsurancebali[.]com

Conclusion

The studies presented in this paper have illustrated how a relatively simple 'domain risk scoring' approach can be used to effectively rank domains identified through broad searches, so as to identify names of particular interest, even in cases where the brand name used as the basis of the search may be a very short or common term.

In extensions to this idea, it would be possible to extend the scoring formulation to take account of other inherent characteristics of the domain, such as TLD, MX record, or registrant, registrar or hosting-provider characteristics, many of which can themselves be assigned into 'tiers' of potential threat level, and scored accordingly.

Finally, by combining this domain risk scoring approach with a 'content risk score' formulation, it is possible to carry out a deeper dive into the set of ranked results, to identify live content of potential interest, to serve as priority targets for further analysis, content tracking, or enforcement.

References

[1] https://circleid.com/posts/towards-a-generalised-threat-scoring-framework-for-prioritising-results-from-brand-monitoring-programmes

[2] https://interbrand.com/best-global-brands/

[3] https://circleid.com/posts/20230703-an-overview-of-the-concept-and-use-of-domain-name-entropy

This article was first published as a white paper on 3 July 2024 at:

https://www.iamstobbs.com/uploads/general/Exploring-a-domain-scoring-system-with-tricky-brands-e-book.pdf


Tuesday, 1 July 2025

Yupoo: Image-sharing as a gateway to counterfeit sales

Chinese image-sharing platform Yupoo (yupoo.com) has once again been highlighted in an investigation by World Trademark Review[1] as a worrying source of links to listings offering the sale of counterfeit products for a wide range of brands. As such, it is certainly a site to keep on the radar of any brand owner utilising a brand-protection programme geared towards the identification of infringing product listings. 

The investigation terms the platform as the 'go-to shopfront for major counterfeiters in China'. The primary method of such use appears to be the posting by counterfeiters of galleries of images of the items on offer, accompanied by captions linking to external marketplace listings (such as the Chinese site, Weidian) or providing contact details (often via WeChat or WhatsApp) for reaching out to the seller directly. The practice appears to be gaining popularity as a result of recent observed increases in compliance by other platforms, in response to takedown requests against infringing goods listed directly on them, together with other proactive enforcement measures.

In many cases - particularly where the images link to listing on platforms such as AliExpress or DHgate - the destination pages can be found to show generic listings with multiple product variations available, with instructions on how to purchase the actual counterfeit item (through the selection of appropriate options in the marketplace listing) having been provided on the referring Yupoo page. As such, this practice - known as the use of 'hidden links'[2] - appears to be fairly prevalent on the Yupoo platform.

Following these trends, Yupoo has added a disclaimer to its pages, absolving itself of responsibility for uploaded content. Although there is no well-established reporting function for infringements on the platform, there is a contact address (1400439992[at]qq.com) which can be used by brand-protection practitioners for directly reporting infringements, provided certain pieces of relevant documentation (such as colour scans of a trademark certificate, and power-of-attorney documentation, according to the WTR investigation) are provided. 

How might this infringing content appear in practice on the platform? As a simple proxy for the approach taken by a formal monitoring service, performing a Google 'site' query for results on the Yupoo site, which also contain the name of a target brand of interest, typically returns a number of relevant user profile pages of the form [username].x.yupoo.com. One typical example, containing subsections for a number of luxury brands, is shown in Figure 1.

Figure 1: Example of a Yupoo user profile page containing images potentially associated with the sale of counterfeit goods for a number of luxury brands

In this case, the 'contact' tab contains a range of contact details, including a range of WhatsApp contact numbers, Facebook and Instagram profiles, an e-mail address, and a link to a standalone e-commerce site hosted on szwego[.]com - a fairly rich dataset which could also form the basis of an OSINT-style investigation to establish links with other entities involved in the sale of infringing items.

In another example, the Yupoo profile page gives a link to an external website (copyaaa[.]ru), which itself re-directs to a standalone e-commerce site - yupoo-dhgate[.]ru - hosted on a specially-registered domain whose name makes reference to the Yupoo brand (in addition to comprising an infringement against the name of the DHgate marketplace) (Figure 2).

Figure 2: The standalone e-commerce website hosted at yupoo-dhgate[.]ru

Might this be common practice against the sellers creating their shopfronts on the popular Yupoo site? A search of domain-name zone-file data shows that, as of the start of June 2025, there are around 852 (gTLD) domains with names containing 'yupoo'. Many of these domains similarly also feature the names of other marketplaces (e.g. AliExpress, AllChinaBuy, Tmall) or other potentially targeted brands (e.g. Adidas, Air Max, Apple, Balenciaga, Breitling, Burberry, Bvlgari, Cartier, Dior, Dr Martens, Dyson, Fendi, Givenchy, Gucci, Hermes, Jordan, Louboutin, Moncler, Supreme, Vilebrequin, Zanotti). One additional point of particular note in the dataset is the popularity of use of the .fashion domain-name extension (TLD), accounting for 480 of the cases. Overall, 515 of the domains produce some sort of live website response, and a number of explicit 'clusters' of associated sites are apparent within the set - such as 348 sites with page titles ending with the phrase 'gucci bags watches nike clothing'. 

Many of the domains in the wider dataset resolve to similar 'shopfront-style' pages, re-direct to pages providing contact details (frequently via WhatsApp profile pages), or resolve to other standalone e-commerce sites in their own right. Some of these also infringe other e-commerce brands, e.g. by duplicating the 'look-and-feel' of marketplace platform websites – or the Yupoo platform itself. In some of the latter cases (which include examples such as bestyupoo[.]com, jerseyyupoo[.]com, jersey-yupoo[.]com, soccer-jersey-yupoo[.]com, yupoo[.]sale, yupoo[.]site, and yupooclothes[.]com), similarities in elements of the HTML source code of the sites indicates possible re-use of common templates in the website construction.  

Some examples of the wider set of infringements are shown in Figure 3. In certain examples, additional features commonly associated with the sale of counterfeit items = such as the use of obfuscation of the names of the brands being targeted (as a means of evading both detection and enforcement on any associated external platforms) = are also observed (Figure 4).

Figure 3: e-commerce or 'shopfront' websites hosted on Yupoo-specific domain names (ali-yupoo[.]com, jersey-yupoo[.]com, yupooshoes[.]net, shoeyupoo[.]com, yupooreplica[.]com (re-directs to dhgatechina[.]ru), 8billion-yupoo[.]com (and other examples re-directing to the same content; yupoo[.]qiqis[.]top))

Figure 4: Example of a website hosted on a Yupoo-specific domain name and featuring product listings with obfuscated brand names

References

[1] https://www.worldtrademarkreview.com/article/yupoo-remains-home-counterfeit-shop-fronts-chinese-image-platform-backs-away-legal-liability

[2] https://circleid.com/posts/20220510-breaking-the-rules-on-counterfeit-sales-the-use-of-hidden-links

This article was first published on 1 July 2024 at:

https://www.iamstobbs.com/insights/yupoo-image-sharing-as-a-gateway-to-counterfeit-sales

Monday, 23 June 2025

'Notorious IP Addresses' and initial steps towards the formulation of an overall threat score for website

Part of the 'Patterns in Brand Monitoring: Brand Protection Data is Beautiful' series of articles[1,2,3]

EXECUTIVE SUMMARY

The ability to rank results according to the level of threat they pose is a key component of many brand protection services, offering the ability to identify priority targets for further analysis, content tracking or enforcement.

Metrics providing the capability to rank results in this way are often based on a range of website characteristics, including webpage content and technical configuration features of the associated domain name. 

This study considers the case of website hosting characteristics, with a specific focus on the IP address at which the website is hosted. The IP address - and, by extension, the associated hosting service provider - can be an important factor to consider, as hosting providers can vary in their level of attractiveness to infringers, based on a range of factors such as their compliance to takedown requests. 

The analysis presented in this case utilises data from an IP address 'blacklist', compiled using insights from any identified association of the address in question with content found to be infringing, such as use for spamming or malware distribution. The construction of one possible formulation of a threat-score component based on the host IP-address is then presented, calculated using the proximity of the IP address in question to other addresses explicitly included in the blacklist. The algorithm is based on the subdivision of IP-address space into 'netblocks', across which patterns in the frequency of infringing content are also considered.

This article was first published on 19 June 2024 at:

https://www.iamstobbs.com/insights/notorious-ip-addresses-and-initial-steps-towards-the-formulation-of-an-overall-threat-score-for-websites

* * * * *

WHITE PAPER

Introduction

The identification of those website characteristics which are disproportionately associated with infringing or illicit activity is a key element of the process of threat quantification for brand-protection findings. Quantifying the level of (current or potential future) risk of an identified domain name or website has a number of benefits, including the ability to identify priority targets for further analysis, content tracking or enforcement, amongst potentially large datasets[4,5]. The same datasets can also provide insights into 'clusters' of associated findings[6], and into the likelihood of enforcement success in any particular case.

Two such characteristics are the registrar (the organisation through which the associated domain name was registered) and the hosting provider (the organisation supplying the physical infrastructure - i.e. a webserver - on which a site is hosted) of any given website in question. There are many possible reasons why specific registrars and hosting providers may be disproportionately popular with infringers, including differences in their inherent level of cooperativeness ('compliance') to notifications of IP infringements, their speed of response[7], geographic region(s) of operation, and so on. 

In the case of registrars, various research organisations collate information on those providers which are found to be more commonly associated with infringing activity of a range of types. The most meaningful datasets are those in which the numbers of infringements are expressed as a proportion of the total number of domains registered, to give an overall 'trust' or 'reputation' score for each registrar, rather than just considering the raw numbers of infringing sites (since this will skew the data towards those registrars which are simply more popular generally). One such dataset is that provided by Spamhaus[8], which (as of 29-Jan-2025) gives the top five 'low-trust' registrars (by quantitative 'bad reputation score') as 'Ultahost, Inc.', 'Domain International Services Limited', 'nicenic.net (ZhuHai NaiSiNiKe Information Technology)', '香港翼优有限公司' ('Hong Kong Wingyou Co., Ltd.') and 'Dnsgulf Pte. Ltd.'. Note that this list does not necessarily imply that the registrars in question are non-compliant with enforcement notices, although it has been noted that the frequent or repeated association of a registrar with infringing activity is often an indication of non-compliance[9]. Examples of non-compliant registrars are also discussed in forums between infringers looking for providers to use for their content[10].

Moreover, many brand protection service providers will have collated (in many cases, quantifiable) information on the compliance of individual registrars, based on their previous enforcement experience. This allows for the construction of a risk 'score' for each registrar, which can serve as an input into algorithms for quantifying the overall level of potential threat of any associated website.

Similar comments are also true of hosting providers. Indeed, some providers explicitly bill themselves as 'bulletproof' -  implying a lack of compliance to enforcement notices - as a means of attracting business from providers of illicit content (Figure 1).

Figure 1: Examples of websites of self-proclaimed ‘bulletproof’ hosting providers (and/or registrars)

Other websites also exist to serve as resources for content producers looking for recommendations of non-compliant providers (Figure 2).

Figure 2: Website offering recommendations of 'bulletproof' hosting providers

Similarly to the case with 'high-risk' registrars, a number of resources are also available where information on infringing hosting providers is collated, such as the information provided again by Spamhaus[11]

In this study, we aim to collate information from a related dataset; namely a 'blacklist' of IP addresses which has been compiled based on reports of associated infringing activity of a variety of types, and from a range of sources. This analysis aims to identify any trends and patterns in the groups of high-risk IP addresses[12] - and, by extension, the hosting providers with which they are associated - as a means of establishing additional datasets which could be used as data inputs into algorithms for assessing overall website potential risk level (i.e. if a website is hosted on a high-risk IP address, it is potentially more likely to be associated with illicit activity). 

Analysis

The dataset used in this case is the IP address blacklist provided by Myip.ms[13], containing around 169,000 listings (0.0039% of the total possible IP-space)[14] as of 29-Jan-2025. The (IPv4) addresses are of the format xx.xx.xx.xx, where each 'xx' is a number between 0 and 255. In this study, we use the terminology 'netblock' to refer to a group of IP addresses with the same initial elements; a group of addresses of the form A.xx.xx.xx (with fixed 'A') would be a 'first-level netblock', A.B.xx.xx a 'second-level netblock' and A.B.C.xx a 'third-level'.

The most obvious initial stage of analysis would simply be to consider the hosting provider and country associated with each of the IP addresses in the dataset. This 'granular' approach in some ways provides more meaningful information than any insights gained by grouping together the individual IP addresses into their respective netblocks, not least because there is not necessarily any reason to believe that all addresses in a particular netblock are associated with each other, or with a common hosting provider (although it is often the case that major providers may control entire netblocks). Nevertheless, a netblock-based analysis can provide some useful insights.

The most obvious observation is that the blacklisted IP addresses are not distributed evenly across IP-space; Figure 3 shows the total number of such addresses within each first-level netblock.

 

Figure 3: Total number of blacklisted IP addresses within each first-level netblock

The 10.xx.xx.xx, 11.xx.xx.xx, 127,xx.xx.xx and all blocks from 224.xx.xx.xx onwards do not contain any blacklisted addresses. The majority of these have special uses, however, such as the 127 netblock, which is reserved for (internal) loopback addresses[15], and the 10 netblock, reserved for private networks[16]

Next, we consider the IP address 'universe' grouped into second-level netblocks (A.B.xx.xx), of which there are 65,536 (i.e. 2562) in total. Using this framework, it is possible to determine how many blacklisted IP addresses appear in each block (which may provide valuable insights, working on the principle that those blocks more highly populated with blacklisted addresses could, all other factors being equal, be deemed 'higher risk' for any arbitrary associated other websites). This dataset is presented graphically in Figure 4.

Figure 4: Number of blacklisted IP addresses (out of a possible maximum of 65,536) in each second-level netblock - first-level address component ('A' in A.B.xx.xx) (from 0 to 255) shown across the horizontal axis; second-level address component ('B' in A.B.xx.xx) (from 0 to 255) shown down the vertical axis

The next associated insight is the identification of those individual netblocks which are associated with the greatest numbers of infringements (i.e. the greatest numbers of blacklisted addresses) - i.e. the brightest 'hotspots' in the figure - of which the top ten are shown in Table 1.

Netblock
                                    
No. blacklisted
addresses
                                    
114.119.xx.xx 2,353
159.138.xx.xx 1,606
104.21.xx.xx 1,253
172.67.xx.xx 986
47.251.xx.xx 882
17.241.xx.xx 670
183.130.xx.xx 658
54.36.xx.xx 604
3.145.xx.xx 507
116.2.xx.xx 496

Table 1: Top ten 'high-risk' (second-level) netblocks, by the numbers of blacklisted IP addresses (out of a possible maximum of 65,536)

In additional to the individual 'hotspot' netblocks, a number of vertical 'stripes' are present in the visualisation, indicating groups of adjacent netblocks, all (or many) of which are associated with unusually high levels of infringements (and also more strongly suggesting meaningful links between them). Examples include the first-level netblocks 45.xx.xx.xx (3,279 blacklisted addresses out of a possible 16.7 million (i.e. 2563)), 103.xx.xx.xx (4,210 addresses), 185.xx.xx.xx (4,867 addresses) (red arrows in Figure 5), and the groups of second-level blocks 35.159.xx.xx to 35.243.xx.xx (962 addresses out of a possible 5.5 million), 54.144.xx.xx to 54.246.xx.xx (1,492 / 6.8 million), and 91.190.xx.xx to 91.247.xx.xx (1,026 / 3.8 million) (blue arrows in Figure 5).

Figure 5: Version of Figure 4, but with arrows highlighting 'clusters' of adjacent netblocks all (or many) of which contain high numbers of blacklisted IP addresses

Note that it would also be possible to carry out a similar analysis looking at the third-level netblocks, in which the equivalent of Figure 3 would be a visualisation as a 3D cube. Although a graphical analysis is somewhat more cumbersome, it is a relatively simple matter to identify the highest-risk netblocks (by the number of blacklisted IP addresses - out of a possible maximum of 256 - contained within them), in a way analogous to Table 1. This analysis is shown in Table 2, for all third-level netblocks in which at least half the IP addresses are blacklisted. 

Netblock
                                    
No. blacklisted
addresses
                                    
54.36.148.xx 256
195.154.122.xx 255
95.108.213.xx 254
213.180.203.xx 253
87.250.224.xx 252
110.52.235.xx 252
17.241.219.xx 226
17.241.75.xx 225
17.241.227.xx 219
5.255.231.xx 209
113.123.0.xx 200
52.167.144.xx 195
54.36.150.xx 192
20.171.206.xx 179
117.45.252.xx 175
95.163.255.xx 160
185.220.101.xx 159
195.154.123.xx 146
159.138.152.xx 142
13.66.139.xx 141
52.233.106.xx 136
159.138.128.xx 133
159.138.156.xx 132
159.138.157.xx 132
64.124.8.xx 130
159.138.154.xx 129
159.138.155.xx 129
159.138.153.xx 128

Table 2: Top 'high-risk' (third-level) netblocks, by the numbers of blacklisted IP addresses (out of a possible maximum of 256)

From this data, we can start to see the possible basis of a threat scoring algorithm for arbitrary websites. A website hosted on an IP address which is actually blacklisted is highly likely to be of concern; however, one hosted in one of the netblocks featured in Table 2 (for example) will still warrant careful analysis (i.e. being assigned a 'secondary' level of concern), even if it is hosted on one of the specific IP addresses within the block which is not explicitly blacklisted.

The next stage of analysis is to consider the hosting provider and geographical country of location associated with each of the blacklisted addresses, in order to determine which providers and countries appear most commonly in the dataset and might therefore be deemed 'highest risk'. This information is generally readily available via an IP address 'whois' look-up in each case. 

From this dataset, some patterns are immediately apparent. For example, the set of 'high-risk' addresses between 35.159.xx.xx and  35.243.xx.xx are all associated with Amazon Technologies Inc. and Google LLC as hosting providers, and the 54.144.xx.xx to 54.246.xx.xx set is also under the management of Amazon Technologies Inc.

As a simple way of post-processing the data (so as to extract a 'clean' version of the name of the hosting provider in each case, and to most efficiently collect together - at a high level - IP addresses pertaining to what is actually the same provider), the name of the hosting provider as given by the whois look-up in each case is truncated at the first instance of a comma - so that, for example,  'GoDaddy.com' and 'GoDaddy.com, LLC' are both treated as the same entity. This yields a set of 8,757 distinct entities.

It is worth pointing out that the whois look-ups required specifically to identify the hosting providers of the IP addresses in question (noting that the original IP address blacklist dataset itself also gives country information) failed in 51,696 cases, which may cause the statistics to be 'skewed' somewhat, if the failures are disproportionately associated with particular providers or geographic regions.

From the available data, Tables 3 and 4 show the top (i.e. 'highest risk') hosting providers and countries most commonly associated with the IP addresses in the blacklist.

Hosting provider
                                                                                                        
No. blacklisted
IP addresses
                                    
  Amazon Technologies Inc. 14,030
  CHINANET jiangsu province network 7,285
  Cloudflare 3,317
  Microsoft Corporation 2,817
  Amazon.com 2,796
  Huawei-Cloud-SG 2,526
  DigitalOcean 2,329
  HostPapa 2,157
  Alibaba Cloud LLC 1,971
  CHINANET SHANDONG PROVINCE NETWORK 1,869
  CHINANET Jiangxi province network 1,619
  Google LLC 1,584
  Huawei HongKong Clouds 1,538
  CHINANET Anhui province network 1,382
  CHINANET Guangdong province network 1,222
  PSINet 1,206
  PT TELKOM INDONESIA 1,070
  CHINANET-ZJ Zhongxin node network 873
  CHINANET henan province network 821
  Apple Inc. 756

Table 3: Top (i.e. 'highest risk') hosting providers represented in the IP address blacklist (where data available)

Host country
                                                
No. blacklisted
IP addresses
                                    
  US (USA) 53,373
  CN (China) 27,189
  RU (Russia) 9,669
  SG (Singapore) 6,099
  DE (Germany) 4,734
  ID (Indonesia) 4,264
  BR (Brazil) 4,125
  GB (UK) 3,607
  IN (India) 3,557
  FR (France) 2,450
  VN (Vietnam) 2,140
  UA (Ukraine) 2,065
  PL (Poland) 2,061
  BD (Bangladesh) 2,012
  CA (Canada) 1,989
  TH (Thailand) 1,886
  NL (Netherlands) 1,604
  RO (Romania) 1,558
  RS (Serbia) 1,468
  ZA (South Africa) 1,407

Table 4: Top (i.e. 'highest risk') host countries represented in the IP address blacklist (where data available)

In order to take a more granular view, it is possible to convert each IP address to a city-level location (and an associated latitude / longitude reference) through a process called 'geolocation', for which a number of standard tools are available[17]. From this analysis, we can also extract the top 'high risk' city locations for hosting blacklisted content (Table 5). 

Host city
                                                
No. blacklisted
IP addresses
                                    
  Shanghai, CN 19,981
  Columbus, US 7,061
  Ashburn, US 6,088
  Singapore, SG 4,270
  San Francisco, US 3,553
  Moscow, RU 3,287
  Hong Kong, HK 3,077
  Los Angeles, US 3,063
  Jiaxing, CN 2,714
  Frankfurt am Main, DE 2,497
  San Jose, US 2,478
  Seattle, US 2,127
  New York City, US 1,822
  Amsterdam, NL 1,687
  Buffalo, US 1,674
  London, GB 1,669
  Jakarta, ID 1,571
  Paris, FR 1,431
  Tokyo, JP 1,318
  Dallas, US 1,297

Table 5: Top (i.e. 'highest risk') city host locations represented in the IP address blacklist (where data available)

Following on from the above, it is also possible to construct a 'heat map' to visualise the host locations of the blacklisted IP addresses (essentially, aggregating together the geolocation information into grid squares, and shading them according to the number of blacklisted IP addresses within each square. This visualisation is shown in Figures 6 and 7 (where each grid square covers 1° of latitude / longitude). 

Figure 6: Global heat map showing the host locations of the blacklisted IP addresses (shading denotes the number of addresses hosted within each grid square)

Figure 7: Detailed views of Figure 6 - top to bottom: Americas; Europe and Middle East; Asia

Whilst the numbers presented in this study are meaningful in their own right (in terms of reflecting where (and with whom) the blacklisted IP addresses are hosted - i.e.. the 'dark spots' on the heat map in Figures 6 and 7), they do reflect both the locations of the infringements and the locations where content is most commonly hosted generally. For example, if a particular hosting provider is generally very commonly used, it might not be unreasonable to expect that provider also to be associated with high volumes of infringements (even if the extent of abuse is not disproportionate). For a future piece of analysis, it may be instructive to compare the extent to with which locations and hosting providers are associated with high levels of threat (i.e. numbers of blacklisted IP addresses) with the overall numbers of IP addresses associated with those same locations and hosting providers (e.g. the total numbers of IP addresses under their management), so as to get a more meaningful measure of rate of association with infringing activity (i.e. a 'reputation' score). 

Discussion: Steps towards a threat-scoring framework

The main application of this type of analysis is the determination of factors which are most commonly associated with infringing websites. Once these databases are in place, they can be used as inputs into overall algorithms to quantify the likely level of threat which may be posed by an arbitrary (perhaps newly-identified) website, even in cases where no live website content is not yet present (in the cases of characteristics such as registrars and hosting providers and locations, which are inherent to the technical infrastructure of the domain name in question).

Looking at the case of hosting IP address as an example, it may also be appropriate to assign IP addresses, and IP address ranges, into threat 'tiers' (with associated threat-score components) based on the 'closeness' of their association with known infringing content. A host IP address which is actually blacklisted is likely to be associated with the highest level of potential threat, followed by a non-blacklisted IP address within a netblock which itself contains high numbers of blacklisted addresses. Lower tiers of threat may be appropriate for IP addresses in higher-level netblocks which are generally found to be associated with higher-than-average rates of abuse (such as those covered by the vertical 'stripes' in Figures 4 and 5). 

A fuller formulation of a threat-scoring framework along these lines may also be a topic for future research, but it is instructive to test an initial prototype version based on the characteristics (high-risk IP addresses, hosting providers and registrars[18]) discussed in this study. For this analysis, we consider a sample set of arbitrary domain names registered on a particular day, based on zone-file analysis[19].

For this dataset of around 11,000 domain names, whois look-ups were run to determine the host IP address, the associated hosting provider and the registrar in each case. For each of these three characteristics, a threat-score component (nominally between 0 and 100) was calculated (based on comparison with the datasets outlined in this study, pertaining to the frequency of each of these characteristics with infringing content) for each domain in question. Details of the methodology are given in Appendix A. 

These components were then aggregated together to yield an overall potential threat score for the domain; the simplest implementation of the threat score is that given by simply adding the three components together. In this case, this yields 398 jointly top-scored domains, all with a score of 171, all of which are hosted on an IP address which is explicitly blacklisted (score component = 100), with the dominant remaining component of the score being a contribution of 70, caused by the fact that the sites in question are hosted with Amazon Technologies Inc., which appears extensively in the IP address blacklist. However, the score for this provider is probably artificially rather too high, appearing as an artefact of the fact that Amazon is a very popular hosting provider generally, and highlighting the requirement for some kind of normalisation according to the total number of websites / IP addresses under management. 

The use of a high-threat registrar (according to the Spamhaus list) is probably a better indication of potential infringing activity than either of the other two domain characteristics being considered, so it may be appropriate to increase (by some factor) the weight of the contribution of the registrar score to the overall threat score. In so doing, we gain (apparently) a much more meaningful assessment of the level of potential threat posed by the domains, as verified in many cases by an inspection of site content (where present), or a simple analysis of the types of keywords present in the domain names (suggesting that, even where no live site is yet present, several of the most highly-scored names are likely to have been registered for use in conjunction with the types of content which are frequently of concern, making them worthy of future monitoring). The most highly-scored domain registrations by this weighted threat score are shown in Table 6 (noting that some of these may, of course, actually be legitimate).

* 'NiceNIC' = NiceNIC International Group Co., Limited

Table 6: Top-ranked domains in the dataset by potential threat score

Indeed, of the top twenty domains of with greatest potential threat scores (shown in Table 6), several feature characteristics of particular concern:

  • Two (eflowtollsystem[.]com and kraken2trfqodidvlh4aa337cpzfrhdlfldhve5nf7ujhnmwr7instad[.]com*) generate browser warning pages advising of 'dangerous' content
  • Some are blocked from viewing in certain geographic locations
  • Some resolve or re-direct to apparently innocuous content, but which may also be a means of 'masking' infringing content, which might only be visible at certain times or from certain locations (i.e. 'geoblocking')[20]
  • Some pertain to content which is commonly associated with scams or other types of abuse, such as blockchain technology or cryptocurrency (e.g. claim-pinlink[.]com - re-directs to claims-realios[.]net/main*, proposai-soniclabs[.]com*, resasfinance[.]com*)
  • Others are soliciting for the input of personal details and may be impersonating trusted brands (e.g. 1298245[.]com*)
  • Of the domains which do not resolve to live content (or where the content is not visible as of the date of analysis), several have domain names which are highly suggestive of suspicious or fraudulent use (e.g. unlock-e-trade[.]com, netbotrade[.]com, contactlloydsonline[.]com, secure-coinb[.]com)

Those examples marked with an asterisk are shown in Figure 8.

Figure 8: Examples of live site content of potential concern hosted on domains listed in Table 6.

In cases where this type of threat-scoring approach is applied to sets of domain registrations pertaining to a specific brand (or other issue of interest), the ranking is likely to offer an efficient way of determining which of the names in the dataset are most worthy of initial prioritised analysis or enforcement.

A final point to note is that insights regarding the geographical focuses of infringing activity, as presented in this study, can also help inform wider policies on intellectual property protection, such as identifying key territories in which additional trade mark protection would be advisable.

Appendix A: Methodology for calculating the prototype threat score components

i. Score component based on host IP address / third-level netblock

If the host IP address is explicitly one of the blacklisted addresses, it is automatically assigned a score of 100. If this is not the case, but if the IP address appears in the same third-level netblock as at least one blacklisted address, the score component is calculated as the ratio between the number of blacklisted addresses within the netblock, and 256 (i.e. the total number of possible addresses in the block), multiplied by 100. 

For example, if a domain was found to be hosted in a non-blacklisted IP address in the 159.138.153.xx netblock (which contains 128 blacklisted addresses in total), the threat score component is calculated as (128 / 256) × 100 = 50. 

ii. Score component based on hosting provider

The score component assigned to each hosting provider is based on the frequency of association of each provider with blacklisted IP addresses contained within the dataset utilised in this study. The individual providers thereby fall into a range between 0 and 14,030 blacklisted addresses (Amazon Technologies Inc.). The score component assigned to a website associated with any given hosting provider is calculated as the ratio between the number of blacklisted addresses and (arbitrarily) 20,000, multiplied by 100 (giving a final value between 0 and 70.15). 

Note that the score as defined is therefore unnormalised relative to the total number of IP addresses under management for that hosting provider.

N.B. It is generally necessary to apply an element of data 'cleansing' before carrying out the matching of hosting provider names (and also to aggregate together entries in the blacklist, as necessary), as the same provider may be referenced differently by distinct whois look-ups - e.g. GoDaddy may variably be referenced as 'GoDaddy', 'GoDaddy.com', 'GoDaddy.com, LLC', etc.

iii. Score component based on registrar

The score component associated with the domain registrar is simply based on the dataset provided by Spamhaus, as referenced in the Introduction section of this study (which itself already incorporates an element of 'normalisation', based on the total numbers of domains under management).

The registrars in the Spamhaus database are assigned scores which sit in a range from 0 to 7.6. Wherever a registrar for a domain in the analysed dataset appears in the Spamhaus list, the associated threat-score component is calculated just as the Spamhaus score multiplied by ten (to give a score in the range from 0 to 76, for the dataset provided as of the date of analysis). 

N.B. (1) As for the hosting providers, it is generally necessary to apply an element of data 'cleansing' before matching the registrar given by a whois look-up against the contents of the Spamhaus list (rather than simply carrying out a straight look-up), since the same registrar may be referenced differently across the lists (e.g. 'CSL Computer Service Langenbach GmbH d/b/a joker.com' is referenced by Spamhaus as 'Joker (CSL Computer Service)'.

N.B. (2) In cases where the same registrar appears more than once in the Spamhaus list with a variant name, but with different scores (e.g. 'Turkticaret.net Yazilim Hizmetleri Sanayi ve Ticaret A.S.' (0.0355) and 'Turkticaret.net Yazılım Hizmetleri Sanayi ve Ticaret A.Ş.' (0.0595)), the score used in this analysis is taken simply as the mean of the relevant Spamhaus scores (i.e. 0.0475 in the above case).  

References

[1] https://www.linkedin.com/pulse/brand-protection-data-beautiful-david-barnett-c66be/

[2] https://www.linkedin.com/pulse/brand-protection-data-still-beautiful-part-1-year-domains-barnett-juwhe/

[3] https://www.linkedin.com/pulse/brand-monitoring-data-niblet-5-law-firm-scam-websites-david-barnett-ap5de/

[4] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 3: 'Brand content scoring'

[5] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 5: 'Prioritization criteria for specific types of content'

[6] 'Patterns in Brand Monitoring' (D.N. Barnett, Business Expert Press, 2025), Chapter 6: 'Result clustering'

[7] https://brandsec.com.au/phishing-malicious-domain-names/

[8] https://www.spamhaus.org/reputation-statistics/registrars/domains/

[9] https://bfore.ai/navigating-domain-takedowns-with-non-cooperative-registrars/

[10] e.g. https://www.blackhatworld.com/seo/question-looking-for-bulletproof-domain-registrar.1412558/

[11] https://www.spamhaus.org/resource-hub/bulletproof-hosting/bulletproof-hosting-theres-a-new-kid-in-town/

[12] This study uses the terminology of 'notorious' IP addresses, in reference to the USTR 'Notorious Markets List', which is published annually to reflect those high-risk platforms most commonly associated with facilitating counterfeiting and piracy - see https://ustr.gov/about-us/policy-offices/press-office/press-releases/2025/january/ustr-releases-2024-review-notorious-markets-counterfeiting-and-piracy

[13] https://myip.ms/browse/blacklist/Blacklist_IP_Blacklist_IP_Addresses_Live_Database_Real-time

[14] Note that the analysis focuses only on the 'old format' (IPv4) IP addresses (of the form xx.xx.xx.xx, where each 'xx' is a number between 0 and 255) in the blacklist; this type of analysis is likely to become much more complex as IP address usage transitions to the IPv6 format (yyyy:yyyy:yyyy:yyyy:yyyy:yyyy:yyyy:yyyy, where each 'yyyy' is a four-digit hexadecimal (base-16) number) in the future.

[15] https://www.cronj.com/blog/localhost-127001-a-special-address/

[16] https://en.wikipedia.org/wiki/List_of_assigned_/8_IPv4_address_blocks

[17] In this study, we utilise the Python-based library tool 'IPinfo' (https://pypi.org/project/ipinfo/), which references the dataset available from IPinfo.io. In order to limit the number of geolocation look-ups required in this study, we perform a query only for one IP address in any range where (a) the second-level netblock, (b) the hosting provider name, and (c) the hosting provider country are all the same (i.e. for point (a), where the first- and second-level IP address components are the same). The latitude and longitude of the physical location of all other IP addresses in the range sharing these characteristics is then assumed to be identical.

[18] This approach allows us to incorporate additional information than would be available by (say) just considering the host IP address as a means of identifying the associated hosting provider - this is appropriate given (for example) the fact that, just because an IP address under the management of a particular provider may be blacklisted, it does not necessarily follow that all of that provider's addresses will be higher risk.

[19] The dataset is taken from one day's worth of registrations (117,456) of .com domain names - a TLD for which registration information is generally readily available - as provided by the zonefiles.io website on 01-Feb-2025, relating to the previous day's registrations. The sample analysed in this study consists of every tenth domain name (when sorted into alphabetical order), yielding a dataset of 11,745 domains. Analysis of site content was carried out on 03-Feb-2025.

[20] https://circleid.com/posts/20220531-do-you-see-what-i-see-geotargeting-in-brand-infringements

This article was first published as a white paper on 19 June 2024 at:

https://www.iamstobbs.com/uploads/general/Notorious-IP-addresses-e-book.pdf

(Literally) Everything's £1 – The Poundland domain landscape

With the news that UK-based discount retailer Poundland has been sold to US investment company Gordon Brothers for a 'nominal sum' o...