Monday, 11 November 2024

Phishing trends 2024 - and a look at some new data for domain threat quantification

Overview

This year's annual phishing report by Internet technology consultants Interisle[1] has provided a number of key insights into the current state of the phishing landscape[2]. Phishing - that is, the use of websites to impersonate a brand or other trusted entity with a view to stealing personal details, financial information or funds, often as a 'gateway' for subsequent further cybercrime - continues to be a popular model for online criminals. This a significant concern for brand owners and consumers alike. Of the main findings from the report, some of the most significant are:

  • The number of phishing attacks continues to see year-on-year growth, having increased by around 50,000 to 1.9 million incidents, with an estimated financial loss of $12.5 billion. The top three most targeted brands were Facebook, Gazprom (a Russian energy corporation), and the United States Postal Service (USPS).
  • 1 million unique domain names were utilised in the identified set of phishing attacks, though with a decrease in popularity in the use of domain names containing (an exact match to) the name of the targeted brand, in part probably due to the ease in detection of such names. The majority of attacks take place on specifically maliciously registered domains, rather than on compromised sites.
  • Numbers of other styles of attacks have seen increases, notably in subdomain-based attacks (i.e. where the style of phishing-site URL was of the style [brand-string].[domain].TLD, using e.g. blogspot.com, duckdns.org or weebly.com) (accounting for nearly one-quarter of all cases), or elsewhere in the URL, and the use of the InterPlanetary File System (IPFS) - a Web3 P2P-based technology - to host phishing content (most usually through a Web2-based 'gateway' provider such as dweb.link or ipfs.io).
  • New-gTLD extensions continue to be popular for domains used for phishing (42% of cases), primarily due to the low-cost and ease of registration (i.e. fewer verification checks). ccTLDs have seen a drop in fraudulent usage – to a significant extent, as a result of the exit of Freenom (the former provider of domains on the .tk, .ml, .ga, .cf and .gq extensions) from the registrar business[3], following the termination of their ICANN agreement[4] in response to reports of extensive criminal domain use. Overall, the most common TLDs used for phishing sites in the analysis period were .com, .top, .xyz, .cn, and .info, though when normalised to reflect the numbers of phishing sites as a proportion of the total domains across the extension in question, the highest-risk TLDs were found to be .lol, .bond, .support, .top, and .sbs.
  • Bulk / automated registration of domains has increased in popularity as a methodology used by phishers, accounting for over one-quarter of all phishing-related domains. These most usually make use of strings of random characters or random combinations of dictionary words. The largest set of such domains used for a coordinated set of attacks was a group of over 17,000 domains generally consisting of eight-letter (second-level domain name, or SLD) random strings, such as gzraxywl.lol and htcjkpzb.lol.
  • The set of gTLD registrars most frequently associated with domains used for phishing content continues to be dominated by retail-grade providers, with the top five found to be NameSilo, GoDaddy, GMO d/b/a. Onamae, PublicDomainRegistry, and NameCheap. Normalising the figures by the total numbers of domains under management, the top five most frequently abused are found to be NiceNIC, URL Solutions, Aceville, WebNic, and OwnRegistrar. The first of these has seen exceptional levels of abuse, with 45% of their gTLD portfolio reported for phishing.

A new basis for quantifying domain-name threat?

The use of fixed-length random strings for phishing domains (as mentioned above in the case of the .lol examples) raises the possibility for a new methodology for identifying such strings, clustering together related findings, and providing an additional input into general algorithms for quantifying the potential level of threat posed by registered domain names[5].

Previous studies[6,7] have explored the use of a metric known as domain name entropy - essentially, a measure of the number and variability of characters within the domain-name string - as an indicator of automated registrations. However, although this idea may be useful in cases where the registration scripts generate very long domain names, it is not really very effective for the shorter names described here. This is because a string such as 'gzraxywl' will have an identical entropy value to any other string consisting of eight distinct characters, including dictionary words (as may correspond to other / legitimate registrations). Instead, it may be preferable to make use of phonotactic analysis. This concept has previously been explored in the context of identifying unregistered domains which may be attractive from a 'brandability' point of view[8]. In that case, strings producing a low 'phonotactic violation' score[9] (i.e. those which are most readable or 'word-like') are preferred. Conversely, however, when identifying the (pseudo-)random strings generated by automated registration scripts, those producing the highest scores may be the most likely candidates.

As an example, I consider the set of 8-character alphabetical .lol domains (i.e. the dataset including the examples referenced previously). As of October 2024, there are 78,446 such domains. The distribution of phototactic scores across this dataset is shown in Figure 1.

Figure 1: Distribution of phonotactic violation scores across the set of 8-character alphabetic .lol domains

These scores range up to a value of 73.06 (tlbtwxil.lol), with the remainder of the top five found to be mslpjbpw.lol (66.20), rfmtgliz.lol (66.13), pzvuznnj.lol (64.73), nzktgzhv.lol (64.57) (noting that 7,275 domains in the dataset do not generate a valid score - shown as the bar at a value of -1 in Figure 1 - many of which will also be random or pseudo-random strings).

Considering the top 1,000 domains (all of which achieve scores greater than 33 and do comprise strings which appear visually random), 983 are privacy-protected domains registered through GMO Internet Group Inc. d/b/a Onamae.com (one of the high-threat registrars referenced above) with alidns.com nameservers, and all were registered between 19-Mar-2024 and 08-Aug-2024. Within this set, there are some even more obvious (sub-)clusters, with 50 domains all registered on 23-Apr, 55 on 22-May, 558 on 18-Jul, 52 on 31-Jul, and 261 on 08-Aug (Figure 2). It seems highly likely that these groups do indeed represent coordinated registration events by one or more specific entities. The group of 08-Aug registrations do not, as of the date of analysis (27-Oct), generally resolve to any live site content, but it is not uncommon for phishing sites to be used for just a short period of time before being deactivated.

Figure 2: Distribution of registration dates for the top 1000 8-character alphabetic .lol domains (by phonotactic violation score), by registrar

Summary and key points

The statistics highlight the significant continuing scale of phishing activity, and the importance of proactive programmes of monitoring and enforcement by brand owners. The apparent evolution in methodology by infringers, away from a basis of the use of branded domain names, shows that monitoring needs to encompass not only domain monitoring (covering exact matches and brand-name variants) but must also address general Internet content and make use of additional data sources (such as spam traps, webserver log monitoring and customer abuse reports). This is especially true given the mix of TLDs utilised in phishing domains, some of which may not have zone-file data readily available.

Analysis of the TLDs which are popular with infringers also serves other purposes, including:

  1. Helping to inform domain registration policies for brand owners[10], as part of an initiative to secure key brand terms across high-risk extensions as defensive registrations, in order to prevent them being registered and utilised by fraudsters.

  2. Informing the construction of algorithms to assess the likely future level of threat which may be posed by new identified domain registrations[11]. Similar comments are also true regarding intelligence on those registrars which are most commonly associated with abusive registrations, especially in view of the 2024 amendments to registrars' obligations to implement more robust Domain NameSystem (DNS) abuse mitigation, including the suspension of domains and disabling of phishing websites[12,13].

  3. Enhancing algorithms (via the use of phonotactic analysis techniques) for quantifying domain threat and clustering together related results - which can itself help to lend efficiency to the overall takedown process.

References

[1] https://interisle.net/insights/phishing-landscape-2024-an-annual-study-of-the-scope-and-distribution-of-phishing

[2] https://www.linkedin.com/pulse/phishing-2024-what-domain-owners-brands-need-know-forum-adr-mxx3c/

[3] https://web.archive.org/web/20240213203456/https://www.freenom.com/en/freenom_pressstatement_02122024_v0100.pdf

[4] https://www.icann.org/uploads/compliance_notice/attachment/1219/hedlund-to-zuubier-9nov23.pdf

[5] see also 'Patterns in Brand Monitoring' by D.N. Barnett (Business Expert Press, 2025), Chapter 5: 'Prioritisation criteria for specific types of content'

[6] https://www.linkedin.com/pulse/investigating-use-domain-name-entropy-clustering-results-barnett/

[7] https://circleid.com/posts/20230703-an-overview-of-the-concept-and-use-of-domain-name-entropy

[8] https://circleid.com/posts/20240903-unregistered-gems-identifying-brandable-domain-names-using-phonotactic-analysis

[9] https://linguistics.ucla.edu/people/hayes/BLICK/

[10] https://www.iamstobbs.com/opinion/strategies-for-constructing-a-domain-name-registration-and-management-policy

[11] https://circleid.com/posts/20230117-the-highest-threat-tlds-part-2

[12] https://www.icann.org/resources/pages/global-amendment-2024-en

[13] see also 'Patterns in Brand Monitoring' by D.N. Barnett (Business Expert Press, 2025), Chapter 1: 'Overview of online brand protection'

This article was first published on 11 November 2024 at:

https://www.iamstobbs.com/opinion/phishing-trends-2024-and-a-look-at-some-new-data-for-domain-threat-quantification

No comments:

Post a Comment

Unregistered Gems Part 6: Phonemizing strings to find brandable domains

Introduction The UnregisteredGems.com series of articles explores a range of techniques to filter and search through the universe of unregis...