Tuesday, 7 February 2023

Exploring the domain of hostname-based infringements

Introduction

As noted in numerous previous studies, one of the main objectives in the construction of a deceptive infringement (such as a phishing site) may be the use of a URL which appears similar to that of the official site being targeted. 

One way in which this can be achieved is by constructing a hostname (consisting of a subdomain and domain name combination) which is identical (apart from an additional dot) to that of the genuine brand site. Active use of this technique has been observed in numerous cases - e.g. considering the case of the fictitious banking brand bankbrand.com, the use of a URL such as ba.nkbrand.com to target the bank's customers with a phishing attack. In order to put this type of attack into practice, the infringer needs to register a domain name which is a truncated form of the official brand site (in the above case, nkbrand.com), allowing them to construct the full hostname by configuring the required subdomain (in this case, 'ba.').

Study methodology

In order to investigate the scale of this practice being used for fraud and other brand infringements, I consider hostname-based variations of each of the top 50 most popular brand websites on the Internet[1] (see Appendix). For example, for the domain google.com, I investigate whether any live content exists at any of the following hostname-based variations:

  • g.oogle.com
  • go.ogle.com
  • goo.gle.com
  • goog.le.com
  • googl.e.com

This approach (i.e. checking the subdomain specifically) is more robust than simply checking whether the truncated versions of the domain names (e.g. oogle.com, ogle.com, etc.) have been registered, since some of these (particularly the shortest domain names) may be in use by unrelated third parties. 

Findings

Of the 262 candidate URLs[2] (i.e. the hostname-based variants of the top 50 domain names), 89 (34%) have active A records (indicating that they point at a live IP address) and 37 (14%) have active MX records (indicating that they have been configured to be able to send and receive e-mails), as shown in Figure 1. Significantly (where whois information is available), only six (2.6%) of the 233[3] truncated domain-name variants are registered to the brand owner who could be targeted using an associated hostname infringement. 

Figure 1: Breakdown of URLs by presence of A and MX records

Of the 89 URLs with active A records, a range of content types were observed, including:

  • Live third-party content - Pages where the URL resolves or re-directs to content unrelated to the brand in question (i.e. traffic misdirection)
  • PPC - Sites monetised through the inclusion of pay-per-click links
  • Domain-for-sale pages - Pages where the domain name is explicitly being offered for sale

A breakdown of the numbers is shown in Figure 2.

Figure 2: Breakdown of URLs with active A records by content type

It is worth noting that some of the instances of URLs resolving to live content may arise through the use of wildcard DNS records[4] (i.e. where the domain has been configured such that any arbitrary subdomain will resolve, rather than the specific subdomain having been explicitly configured). However, any URL pointing to a live IP address raises the potential for fraudulent or infringing use. At the time of analysis, none of the 262 URLs resolved to live phishing sites targeting the brand in question; however, it has been previously noted that in many cases, sites are left in a dormant state - in some cases, for an extended period of time - before being weaponised[5,6]. Consequently, many of the sites resolving to parking, holding or inactive pages may be worthy of monitoring for future changes in content. Furthermore, some of the identified instances of URLs resolving to third-party content may be of particular concern to the brand owner, if they misdirect web-users to competitor content or provide an undesirable brand association. Some examples include:

  • Hostname-based variant of google[.]com  Resolves to a page promoting a VPN product
  • Hostname-based variant of yandex[.]com  Re-directs to a flight-sales website
  • Hostname-based variant of xvideos[.]com  Resolves to a third-party adult website
  • Hostname-based variant of pornhub[.]com  Re-directs to a third-party adult website
  • Hostname-based variant of linkedin[.]com  Resolves to a gambling-site portal page
  • Hostname-based variant of ebay[.]com  Re-directs to the Google website

Additionally, the frequency of PPC pages within the dataset indicates the popularity to infringers of monetising domains whilst in their dormant state. Furthermore, the fact that many of these examples display content unrelated to the brand in question may also suggest that they have been configured to attract web traffic arising from mistyped browser requests, rather than being intended as explicitly deceptive variants of the brand domain name in question.

As a final observation, we can compare the date of registration with the length (in characters) of the second-level domain (SLD) name string (i.e. the portion of the domain name prior to the TLD, or domain extension), for each of the 233 potentially infringing domain names in the dataset (where these are registered and have whois information available) (Figure 3).

Figure 3: Comparison of date of registration with length of the SLD name, for the domains comprising (right-) truncated versions of the top 50 most popular domain names

The dataset shows that the domains in question have been registered over an extended period, between 1986 and 2022. The shorter domain names - i.e. those which are more likely to have been used for unrelated third-party or generic use - tend to comprise the oldest registrations. However, many of the domains with longer SLD string lengths - i.e. those less likely to be associated with 'accidental' brand collisions, and more likely to have been registered specifically to create hostname-based infringements - tend to have been registered over the last few years, highlighting a potential growth in popularity over time of this particular attack vector.

Summary and recommendations

The proportion of hostname-based infringements resolving to live content, or configured with active A and/or MX records - combined with previous observations of the use of this type of infringement as a phishing attack vector - highlights the scale of this infringement type as a potential source of concern. Consequently, brand owners may wish to consider proactively registering or acquiring domain names comprising truncated versions (where the right-hand end is retained) of their core domain name, to prevent registration and abuse by a third party. In cases where acquisition is not possible, it may be advisable to monitor the hostname-based infringements for future changes in content and - if and when active infringing content is detected - launching a timely enforcement action for the takedown of the material.

Appendix

Top 50 most popular websites according to Similarweb (October 2022).

Rank
              
Website
 
Category
 
1   google[.]com   Computers Electronics and Technology → Search Engines
2   youtube[.]com   Arts & Entertainment → Streaming & Online TV
3   facebook[.]com   Computers Electronics and Technology → Social Media Networks
4   twitter[.]com   Computers Electronics and Technology → Social Media Networks
5   instagram[.]com   Computers Electronics and Technology → Social Media Networks
6   baidu[.]com   Computers Electronics and Technology → Search Engines
7   wikipedia[.]org   Reference Materials → Dictionaries and Encyclopedias
8   yandex[.]ru   Computers Electronics and Technology → Search Engines
9   yahoo[.]com   News & Media Publishers
10   xvideos[.]com   Adult
11   whatsapp[.]com   Computers Electronics and Technology → Social Media Networks
12   pornhub[.]com   Adult
13   amazon[.]com   eCommerce & Shopping → Marketplace
14   xnxx[.]com   Adult
15   yahoo[.]co[.]jp   News & Media Publishers
16   live[.]com   Computers Electronics and Technology → Email
17   netflix[.]com   Arts & Entertainment → Streaming & Online TV
18   docomo[.]ne[.]jp   Computers Electronics and Technology → Telecommunications
19   tiktok[.]com   Computers Electronics and Technology → Social Media Networks
20   reddit[.]com   Computers Electronics and Technology → Social Media Networks
21   office[.]com   Computers Electronics and Technology → Programming and Developer Software
22   linkedin[.]com   Computers Electronics and Technology → Social Media Networks
23   dzen[.]ru   Community and Society → Faith and Beliefs
24   vk[.]com   Computers Electronics and Technology → Social Media Networks
25   xhamster[.]com   Adult
26   samsung[.]com   Computers Electronics and Technology → Consumer Electronics
27   turbopages[.]org   News & Media Publishers
28   mail[.]ru   Computers Electronics and Technology → Email
29   bing[.]com   Computers Electronics and Technology → Search Engines
30   naver[.]com   News & Media Publishers
31   microsoftonline[.]com   Computers Electronics and Technology → Programming and Developer Software
32   twitch[.]tv   Games → Video Games Consoles and Accessories
33   discord[.]com   Computers Electronics and Technology → Social Media Networks
34   bilibili[.]com   Arts & Entertainment → Animation and Comics
35   pinterest[.]com   Computers Electronics and Technology → Social Media Networks
36   zoom[.]us   Computers Electronics and Technology → Other Computers Electronics and Tech.
37   weather[.]com   Science and Education → Weather
38   qq[.]com   News & Media Publishers
39   microsoft[.]com   Computers Electronics and Technology → Programming and Developer Software
40   globo[.]com   News & Media Publishers
41   roblox[.]com   Games → Video Games Consoles and Accessories
42   duckduckgo[.]com   Computers Electronics and Technology → Search Engines
43   news[.]yahoo[.]co[.]jp   News & Media Publishers
44   quora[.]com   Reference Materials → Dictionaries and Encyclopedias
45   msn[.]com   News & Media Publishers
46   realsrv[.]com   Adult
47   fandom[.]com   Arts & Entertainment → Other Arts and Entertainment
48   ebay[.]com   eCommerce & Shopping → Marketplace
49   aajtak[.]in   News & Media Publishers
50   ok[.]ru   Computers Electronics and Technology → Social Media Networks

References

[1] https://www.similarweb.com/top-websites/ (data correct for October 2022) 

[2] All observations correct as of 11-Nov-2022

[3] Excluding duplicates

[4] https://en.wikipedia.org/wiki/Wildcard_DNS_record

[5] https://www.cscdbs.com/en/resources-news/impact-of-covid-on-internet-security/

[6] https://unit42.paloaltonetworks.com/strategically-aged-domain-detection/

This article was first published on 7 February 2023 at:

https://www.linkedin.com/pulse/exploring-domain-hostname-based-infringements-david-barnett/

No comments:

Post a Comment

Phishing trends 2024 - and a look at some new data for domain threat quantification

Overview This year's annual phishing report by Internet technology consultants Interisle [1] has provided a number of key insights into...