A central component of the analysis of results identified through a brand monitoring programme is an assessment of the level of threat associated with each finding. Domain names specifically have a number of associated characteristics which can be used to quantify their potential threat, and much previous work has focused on the frequency of association of these characteristics with malicious content. This type of analysis can serve as a basis for the construction of algorithms to quantify the likely level of potential threat associated with any arbitrary identified website[1]. Threat scoring - as a method for prioritising findings - has a range of uses, including the identification of priority targets for further analysis, content tracking, or enforcement.
One such key feature is the domain-name extension (the top-level domain, or TLD - which includes examples such as .com or .xyz), with differing TLDs having wildly varying rates of popularity with infringers and other bad actors, due to a range of factors including registration costs, existence of IP protection programmes, and ease of enforcement.
One previous study on the subject[2] compiled overall threat scores for a group of highly affected TLDs, based on an aggregation of data from other sources, including Spamhaus[3], Netcraft[4] and Palo Alto Networks[5], each of which encompassed insights relating to differing aspects of infringing behaviour.
The release of Spamhaus' latest Domain Reputation Update report[6] - relating to classification of domains as malicious or suspicious based on a range of features, including association with spam, phishing, malware, ransomware and other fraudulent activities - provides an updated view of the highest-threat TLDs covering many relevant areas of infringement, and offers useful insights towards the construction of threat-scoring algorithms.
In considering the main insights, we focus primarily on the subset of Spamhaus' data concerning the rates of infringement within each TLD (i.e. the numbers of malicious domains as a proportion of the total number of domains across the TLD), rather than the absolute numbers, as this offers a more meaningful input into any potential threat-scoring metric.
Amongst the main points to take away from the study are the facts that:
- There are four TLDs (.xin, .qpon, .locker[7], and .lgbt) for which more than half of the total set of domains in the zone file are marked as malicious - with the top example (.xin, popular with a Chinese audience, and usually translated as the word for 'new'[8]) having over 82% of its domains marked as malicious - and 11 TLDs where more than one-in-three of the domains are malicious. All of the top twenty TLDs (by proportion of malicious domains) are mid-size extensions, each with total numbers of domains in a range between 11,000 and 182,000.
- A significant proportion of the domains marked as malicious have been found to be associated with Chinese gambling sites. Previous research has also found some suggestion that these types of sites may operate in conjunction with other types of infringement, such as comprising material which is an alternative to the 'primary' content displayed to certain users, but which may only be visible in certain locations (i.e. geoblocked[9]) or at certain times or days.
- Eight of the top 20 highest-threat TLDs in the Spamhaus dataset are associated with a single registry, BinkyMoon LLC, whose business model involves the offer of highly competitive prices for domains, a tactic which can often drive high levels of abuse.
- A large number of the .xin domains - including many of the malicious examples - have names beginning with 'com-', a strategy noted in a recent Stobbs study[10] as being one way to create compelling deceptive infringements. The vast majority of these are registered through Dominet (HK) Limited, the registrar which rebranded from Alibaba.com Singapore E-Commerce Private Limited, following the issuing of a compliance notice by ICANN in March 2024[11].
- Amongst the set of malicious domains, use of brand-related terms appears to be decreasing - perhaps, in part, due to their relative ease of detection and enforcement through brand-protection programmes - in favour of more generic, industry- or subject-related terms.
As a follow-up piece of analysis, even considering just a direct visual inspection of the raw domain data in the zone files of the highest-threat TLDs (as given in the Spamhaus study), some trends are immediately apparent: the domains across the TLDs in question appear to include disproportionately high numbers of numeric domains[12] (i.e. where the SLD - or second-level domain name; the part to the left of the dot - contains digits only), perhaps indicating a popularity of such domains for infringing use; and the .xin file indeed does appear to contain large numbers of 'com-' examples, but also in addition to 'us-' domains, which may have a similar potential use-case (i.e. constructing URLs resembling legitimate domains on the .us domain extension).
Results from a more detailed quantitative analysis of these and other relevant points, carried out using the full zone-file data, are outlined in Table 1 and Figure 1.
Table 1: Top level statistics for the domains in Spamhaus' top ten highest-risk TLDs
Figure 1: Numbers of numeric domain names, by SLD length, for each of Spamhaus' top ten highest-risk TLDs
Some of the main insights from the overall analysis are as follows:
- Across the top-ten highest-risk TLDs (from Spamhaus' data), numerical domain names account for a significant proportion of the total (53% of all domains across the full set of ten zone files. For five of the TLDs, numeric domains account for more than 70% of the total. The highest proportions are seen for .loan (87.66% numeric domain names in total) and .locker (84.56%). Of the numeric domain names, the vast majority are 4, 5 or 6 characters in length (4.1%, 70.7% and 23.9% of the total, respectively).
- There are no obvious patterns in domain name entropy (a mathematical measure of the length and 'randomness' of a domain name) and, despite the fact that a significant proportion of the domains under consideration are (by definition) malicious, this is not reflected by a prevalence of particularly high entropy names[13] (as might typically be associated with automated registrations)[14]. Of the ten TLDs considered, the highest mean entropy was seen for the domains on .xin.
- A prevalence of (potentially deceptive) 'com-' and 'us-' domains was seen only on .xin. For many of these domains, the portion of the SLD after 'com-' consisted of what appeared to be essentially random characters, but some specific use-patterns were identified, such as groups of domains with SLDs featuring specific keywords. These have a range of potential fraudulent use-cases, such as the construction of deceptive URLs resembling official sites for making payments (e.g. for road tolls) or accessing other financial or technical information.
Example groups included:
- Toll-related domains - e.g. names of the form: com-highroadXXX, com-roadXXXXXX, com-tollXXXX or com-tollbillXXX
- (Other) billing- or payment-related domains - e.g. com-lnvoiceXX [sic], and mis-spellings of com-payment and com-statementXX
- Other classes of keywords: e.g. mis-spellings of com-lucky, com-passX, com-serviceXXX, com-shtmlXXXXX, com-ticketXXXX, com-updateXXX, and us-etcXX (perhaps in reference to the cryptocurrency Ethereum Classic)
Overall, these types of insights into (in this case) the highest-threat TLDs can greatly aid in the construction of metrics for prioritising brand monitoring results, and can thereby build efficiencies into the analysis and enforcement processes. The TLD of a webpage (or other online finding) is just one relevant characteristic, so this type of analysis would generally need to be combined with findings from other studies, in the construction of any overall threat-scoring algorithm.
References
[2] https://circleid.com/posts/20230117-the-highest-threat-tlds-part-2
[3] https://www.spamhaus.org/statistics/tlds/
[4] https://trends.netcraft.com/cybercrime/tlds
[5] https://unit42.paloaltonetworks.com/top-level-domains-cybercrime/
[6] https://www.spamhaus.org/resource-hub/domain-reputation/domain-reputation-update-oct-2024-mar-2025/
[7] https://www.iamstobbs.com/opinion/some-more-new-domains-in-the-.locker
[8] https://tld-list.com/tld/xin
[9] https://circleid.com/posts/20220531-do-you-see-what-i-see-geotargeting-in-brand-infringements
[11] https://www.icann.org/uploads/compliance_notice/attachment/1221/hedlund-to-chu-27mar24.pdf
[12] https://www.iamstobbs.com/opinion/the-universe-of-numeric-domain-names
[13] https://www.iamstobbs.com/opinion/un-.zip-ping-and-un-.box-ing-the-risks-associated-with-new-tlds
[14] https://circleid.com/posts/20230703-an-overview-of-the-concept-and-use-of-domain-name-entropy
This article was first published on 5 June 2025 at:
https://www.iamstobbs.com/insights/an-updated-view-of-bad-tlds
No comments:
Post a Comment