Monday, 11 November 2024

Phishing trends 2024 - and a look at some new data for domain threat quantification

Overview

This year's annual phishing report by Internet technology consultants Interisle[1] has provided a number of key insights into the current state of the phishing landscape[2]. Phishing - that is, the use of websites to impersonate a brand or other trusted entity with a view to stealing personal details, financial information or funds, often as a 'gateway' for subsequent further cybercrime - continues to be a popular model for online criminals. This a significant concern for brand owners and consumers alike. Of the main findings from the report, some of the most significant are:

  • The number of phishing attacks continues to see year-on-year growth, having increased by around 50,000 to 1.9 million incidents, with an estimated financial loss of $12.5 billion. The top three most targeted brands were Facebook, Gazprom (a Russian energy corporation), and the United States Postal Service (USPS).
  • 1 million unique domain names were utilised in the identified set of phishing attacks, though with a decrease in popularity in the use of domain names containing (an exact match to) the name of the targeted brand, in part probably due to the ease in detection of such names. The majority of attacks take place on specifically maliciously registered domains, rather than on compromised sites.
  • Numbers of other styles of attacks have seen increases, notably in subdomain-based attacks (i.e. where the style of phishing-site URL was of the style [brand-string].[domain].TLD, using e.g. blogspot.com, duckdns.org or weebly.com) (accounting for nearly one-quarter of all cases), or elsewhere in the URL, and the use of the InterPlanetary File System (IPFS) - a Web3 P2P-based technology - to host phishing content (most usually through a Web2-based 'gateway' provider such as dweb.link or ipfs.io).
  • New-gTLD extensions continue to be popular for domains used for phishing (42% of cases), primarily due to the low-cost and ease of registration (i.e. fewer verification checks). ccTLDs have seen a drop in fraudulent usage – to a significant extent, as a result of the exit of Freenom (the former provider of domains on the .tk, .ml, .ga, .cf and .gq extensions) from the registrar business[3], following the termination of their ICANN agreement[4] in response to reports of extensive criminal domain use. Overall, the most common TLDs used for phishing sites in the analysis period were .com, .top, .xyz, .cn, and .info, though when normalised to reflect the numbers of phishing sites as a proportion of the total domains across the extension in question, the highest-risk TLDs were found to be .lol, .bond, .support, .top, and .sbs.
  • Bulk / automated registration of domains has increased in popularity as a methodology used by phishers, accounting for over one-quarter of all phishing-related domains. These most usually make use of strings of random characters or random combinations of dictionary words. The largest set of such domains used for a coordinated set of attacks was a group of over 17,000 domains generally consisting of eight-letter (second-level domain name, or SLD) random strings, such as gzraxywl.lol and htcjkpzb.lol.
  • The set of gTLD registrars most frequently associated with domains used for phishing content continues to be dominated by retail-grade providers, with the top five found to be NameSilo, GoDaddy, GMO d/b/a. Onamae, PublicDomainRegistry, and NameCheap. Normalising the figures by the total numbers of domains under management, the top five most frequently abused are found to be NiceNIC, URL Solutions, Aceville, WebNic, and OwnRegistrar. The first of these has seen exceptional levels of abuse, with 45% of their gTLD portfolio reported for phishing.

A new basis for quantifying domain-name threat?

The use of fixed-length random strings for phishing domains (as mentioned above in the case of the .lol examples) raises the possibility for a new methodology for identifying such strings, clustering together related findings, and providing an additional input into general algorithms for quantifying the potential level of threat posed by registered domain names[5].

Previous studies[6,7] have explored the use of a metric known as domain name entropy - essentially, a measure of the number and variability of characters within the domain-name string - as an indicator of automated registrations. However, although this idea may be useful in cases where the registration scripts generate very long domain names, it is not really very effective for the shorter names described here. This is because a string such as 'gzraxywl' will have an identical entropy value to any other string consisting of eight distinct characters, including dictionary words (as may correspond to other / legitimate registrations). Instead, it may be preferable to make use of phonotactic analysis. This concept has previously been explored in the context of identifying unregistered domains which may be attractive from a 'brandability' point of view[8]. In that case, strings producing a low 'phonotactic violation' score[9] (i.e. those which are most readable or 'word-like') are preferred. Conversely, however, when identifying the (pseudo-)random strings generated by automated registration scripts, those producing the highest scores may be the most likely candidates.

As an example, I consider the set of 8-character alphabetical .lol domains (i.e. the dataset including the examples referenced previously). As of October 2024, there are 78,446 such domains. The distribution of phototactic scores across this dataset is shown in Figure 1.

Figure 1: Distribution of phonotactic violation scores across the set of 8-character alphabetic .lol domains

These scores range up to a value of 73.06 (tlbtwxil.lol), with the remainder of the top five found to be mslpjbpw.lol (66.20), rfmtgliz.lol (66.13), pzvuznnj.lol (64.73), nzktgzhv.lol (64.57) (noting that 7,275 domains in the dataset do not generate a valid score - shown as the bar at a value of -1 in Figure 1 - many of which will also be random or pseudo-random strings).

Considering the top 1,000 domains (all of which achieve scores greater than 33 and do comprise strings which appear visually random), 983 are privacy-protected domains registered through GMO Internet Group Inc. d/b/a Onamae.com (one of the high-threat registrars referenced above) with alidns.com nameservers, and all were registered between 19-Mar-2024 and 08-Aug-2024. Within this set, there are some even more obvious (sub-)clusters, with 50 domains all registered on 23-Apr, 55 on 22-May, 558 on 18-Jul, 52 on 31-Jul, and 261 on 08-Aug (Figure 2). It seems highly likely that these groups do indeed represent coordinated registration events by one or more specific entities. The group of 08-Aug registrations do not, as of the date of analysis (27-Oct), generally resolve to any live site content, but it is not uncommon for phishing sites to be used for just a short period of time before being deactivated.

Figure 2: Distribution of registration dates for the top 1000 8-character alphabetic .lol domains (by phonotactic violation score), by registrar

Summary and key points

The statistics highlight the significant continuing scale of phishing activity, and the importance of proactive programmes of monitoring and enforcement by brand owners. The apparent evolution in methodology by infringers, away from a basis of the use of branded domain names, shows that monitoring needs to encompass not only domain monitoring (covering exact matches and brand-name variants) but must also address general Internet content and make use of additional data sources (such as spam traps, webserver log monitoring and customer abuse reports). This is especially true given the mix of TLDs utilised in phishing domains, some of which may not have zone-file data readily available.

Analysis of the TLDs which are popular with infringers also serves other purposes, including:

  1. Helping to inform domain registration policies for brand owners[10], as part of an initiative to secure key brand terms across high-risk extensions as defensive registrations, in order to prevent them being registered and utilised by fraudsters.

  2. Informing the construction of algorithms to assess the likely future level of threat which may be posed by new identified domain registrations[11]. Similar comments are also true regarding intelligence on those registrars which are most commonly associated with abusive registrations, especially in view of the 2024 amendments to registrars' obligations to implement more robust Domain NameSystem (DNS) abuse mitigation, including the suspension of domains and disabling of phishing websites[12,13].

  3. Enhancing algorithms (via the use of phonotactic analysis techniques) for quantifying domain threat and clustering together related results - which can itself help to lend efficiency to the overall takedown process.

References

[1] https://interisle.net/insights/phishing-landscape-2024-an-annual-study-of-the-scope-and-distribution-of-phishing

[2] https://www.linkedin.com/pulse/phishing-2024-what-domain-owners-brands-need-know-forum-adr-mxx3c/

[3] https://web.archive.org/web/20240213203456/https://www.freenom.com/en/freenom_pressstatement_02122024_v0100.pdf

[4] https://www.icann.org/uploads/compliance_notice/attachment/1219/hedlund-to-zuubier-9nov23.pdf

[5] see also 'Patterns in Brand Monitoring' by D.N. Barnett (Business Expert Press, 2025), Chapter 5: 'Prioritisation criteria for specific types of content'

[6] https://www.linkedin.com/pulse/investigating-use-domain-name-entropy-clustering-results-barnett/

[7] https://circleid.com/posts/20230703-an-overview-of-the-concept-and-use-of-domain-name-entropy

[8] https://circleid.com/posts/20240903-unregistered-gems-identifying-brandable-domain-names-using-phonotactic-analysis

[9] https://linguistics.ucla.edu/people/hayes/BLICK/

[10] https://www.iamstobbs.com/opinion/strategies-for-constructing-a-domain-name-registration-and-management-policy

[11] https://circleid.com/posts/20230117-the-highest-threat-tlds-part-2

[12] https://www.icann.org/resources/pages/global-amendment-2024-en

[13] see also 'Patterns in Brand Monitoring' by D.N. Barnett (Business Expert Press, 2025), Chapter 1: 'Overview of online brand protection'

This article was first published on 11 November 2024 at:

https://www.iamstobbs.com/opinion/phishing-trends-2024-and-a-look-at-some-new-data-for-domain-threat-quantification

Thursday, 7 November 2024

"It’s beginning to look a lot like...": Domain patterns in the approach to the holiday shopping season 2024

Introduction 

As we approach the start of this year's holiday shopping season, dominated by the Chinese-focused Singles' Day (11/11) and the western Black Friday and Cyber Monday events (this year on 29-Nov and 02-Dec) - but also including platform-specific promotions such as Amazon Prime Day(s) and the general ramp-up in spending towards December - we conduct a revisit of last year's analysis[1] looking at the registration of related domain names.   

The holiday shopping period provides an opportunity for brand owners and infringers alike, to take advantage of increased levels of online spend and related searches to drive consumers to their own content. As part of this initiative, many will register specific domain names related to the events in question, and in this study, we consider the landscape of such domains.  

Landscape data overview and deep-dive 

As of the date of analysis (11-Oct-2024), zone-file searches revealed the existence of 6,667 active registered domains with names containing 'black(-)friday', 'cyber(-)monday' or 'singles(-)day' (hereafter referred to as 'holiday shopping' domains). The analysis also focuses only on gTLD domains, likely to be most relevant to the landscape of potential infringements (in view of their typical lower cost and lower levels of registration restrictions). Of these, 519 were disregarded from further analysis on the basis of being registered via enterprise-level corporate registrars and thereby most likely to be representing legitimate brand promotions.   

Within the remainder of the dataset[2], a range of domain ages are represented, with the oldest registrations dating back to 2001. However, the striking regional cycle of activity noted in the previous study continues to be apparent, with the vast majority of the domains registered in the latter half of each year, in the run-up to the season in question (Figure 1). The apparent drop-off and smaller size of the 2024 peak is likely to be an artefact of the fact that the analysis was carried out early in October, and so the final data point represents a (significantly) incomplete month; the numbers for the previous months do actually show a year-on-year increase (08-2023 = 33; 08-2024 = 36; 09-2023 = 91; 09-2024 = 98). 

Figure 1: Numbers of active holiday shopping domains (as of 11-Oct-2024), by original month of registration 

Considering specific indicators of the likely nature of activity of the domains within the dataset we find that:  

  • 1,887 of the set of 6,148 (i.e. 28%) return some sort of live website response 
  • 1,031 (i.e. 55% of the live sites) include at least one high-risk keyword or other term ('login', 'shop', 'store', 'discount', 'replica', or 'cheap') indicating that the primary focus of the website is (potentially non-legitimate) e-commerce  
  • 2,461 (i.e. 40%) feature active MX records - indicating that the domain has been configured to be able to send and receive e-mails - meaning that, even in the absence of a live website, the domain may be associated with phishing or other types of e-mail-based scams 
  • Considering the domain extensions within the dataset (i.e. the top-level domains, or TLDs), six of the top ten are new gTLD extensions, which have previously been noted as disproportionately being associated with infringing use (#2 .site (387 domains), #4 .shop (208), #5 .today (153), #6 .online (148), #8 .click (90), #10 .xyz (73)).  

Conducting a deeper dive into the dataset, it seems to be the case that a smaller proportion of the sites are directly targeting specific individual brands (either through the inclusion of brand terms in the domain names themselves, or of brand references in the site content) than in previous years, although some such examples were identified (Figure 2).  

Figure 2: Examples of holiday shopping domains resolving to apparently infringing websites targeting specific brands 

Much more common are examples of generic e-commerce sites, in some cases targeting multiple different brands ('multi-brand' sites) (Figure 3), examples of sites giving general shopping or product information, linking to specific marketplaces (presumably as part of affiliate promotions), or referencing potentially unofficial coupon or voucher codes.  

Figure 3: Examples of holiday shopping domains resolving to multi-brand e-commerce sites

A striking new emergence this year - perhaps a reflection of the current economic landscape - are the large numbers of websites using the holiday period to promote their own 'payday-loan'-style offerings (Figure 4). 

Figure 4: Examples of holiday shopping domains resolving to websites offering 'payday loans'

Summary and key points 

Overall, this data review shows that there continues to be a significant amount of illegitimate online activity targeting consumers and, in many cases, abusing trusted household brands. At times of increased numbers of infringements, it becomes all the more important for brand owners to monitor the landscape and conduct proactive programmes of takedowns against egregious findings, as part of a comprehensive brand protection initiative. Of course, domain registrations are only part of the picture; as the boundaries between online channels become increasingly blurred, monitoring initiatives must take also account of a range of platforms, including e-commerce marketplaces (including the increasingly large numbers of product- and region-specific examples), social media, mobile apps, and other general Internet content. This will help brands to protect consumers from infringement types such as counterfeiting and phishing, including examples making use of trending techniques such as hidden links[3]. Brand protection teams may wish to bear such issues in mind when deciding where and how much resource to allocate this coming holiday shopping season. 

References

[1] https://www.iamstobbs.com/opinion/web-dot-coms-but-once-a-year-holiday-shopping-activity-part-1-black-friday-domains

[2] Considering those where domain registration dates are available via an automated whois look-up

[3] https://circleid.com/posts/20220510-breaking-the-rules-on-counterfeit-sales-the-use-of-hidden-links

This article was first published on 7 November 2024 at:

https://www.iamstobbs.com/opinion/its-beginning-to-look-a-lot-like-domain-patterns-in-the-approach-to-the-holiday-shopping-season-2024

Unregistered Gems Part 6: Phonemizing strings to find brandable domains

Introduction The UnregisteredGems.com series of articles explores a range of techniques to filter and search through the universe of unregis...