Showing posts with label Internet. Show all posts
Showing posts with label Internet. Show all posts

Monday, 3 July 2023

An overview of the concept and use of domain-name entropy

Introduction

In this article, I present an overview of a series of 'proof-of-concept' studies looking at the application of domain-name entropy as a means of clustering together related domain registrations, and serving as an input into potential metrics to determine the likely level of threat which may be posed by a domain.

In our previous studies, we utilised the mathematical concept of Shannon entropy[1], providing a measure of the amount of information stored in a string of characters (or, equivalently, the number of bits required to optimally encode the string). The idea was applied to the second-level domain name (SLD) part of each domain (i.e. the portion of the domain name before the dot - such as 'google' in 'google.com'), and broadly means that short domain names, or those with large numbers of repeated characters, will have low entropy values, whereas longer domain names, or those with large numbers of distinct characters, will have higher entropy.

The background to this analysis is the fact that domains registered for egregious purposes (such as spamming, malware distribution, or botnet creation) may be more likely to be registered in bulk by bad actors using automated algorithms[2], which typically results in the generation of long, non-sensical (i.e. high entropy) domain names, which have the added benefit of not containing brand-related keywords and are typically therefore harder to detect using classic brand-monitoring techniques. The idea is that domains registered by a particular infringer for a specific campaign are likely all to be generated using the same algorithm, and may therefore have similar or identical entropy values.

Overview of previous studies

In our initial proof of concept[3], we considered the set of all domains registered on a particular day - a sample of around 205,000 domains. The advantage also of considering a set of domains with a common registration date is that it presents the possibility for one or more groups of automated bulk registrations (which are typically all registered at the same time) to be present.

Within the dataset, a range of domain entropy values was present, from a minimum of 0.000, to a maximum of 4.700, and with 92.3% of the dataset having values below 3.500. (see Figure 1). The top 1,000 highest-entropy domains (i.e. the top 0.49%) had entropy values in excess of 3.823, and accounted for the majority of examples which appeared visually to feature 'random' SLD strings. Within this high-entropy subset, a number of additional characteristics were indicative that many may have been registered for nefarious purposes, including the prominence of use of consumer-grade registrars and privacy-protection services, and the extent of the presence of active MX records amongst these new registrations (in 27.5% of the cases - indicating that these domains have been configured to be able to send and receive e-mails and therefore could potentially be associated with phishing activity).

Figure 1: Cumulative proportion of domains with entropy less than the value shown on the horizontal axis, from the dataset in the initial proof-of-concept study

Indeed, at least one apparent 'cluster' of suspicious registrations was found to be present within the dataset, comprising a group of 125 .buzz ('dot-buzz') domains, all with an identical high entropy value (3.907), registered via a common registrar and associated with groups of similar IP addresses. At the time of analysis, many of the domains registered to Chinese-language, gambling-related websites, likely representing either an affiliate revenue generation scheme, or 'dummy' content serving to 'mask' higher-threat content which may only have been visible in specific geographic regions, or which may have been planned for subsequent upload.

In a follow-up study[4], I considered a month's worth of registrations of domains with names containing any of the top ten most valuable brands in 2022. Similarly, the high entropy domain names within this dataset included groups of apparently related, coordinated 'clusters' of domains, several of which appeared intended for fraudulent use and were consistent with registration via automated generation algorithms. For example, seven of the top eight domains in the dataset (by entropy values) had similar names of the form 'google-site-verificationXXXXXX.com' (or .net) (where 'XXXXXX' was a long string of apparently random characters), and a series of groups of 'microsoft' examples was identified, including keywords such as 'cloudworkflow', 'netsuites' and 'cloudroam'.

Comparison with other work

Other studies taking similar approaches to the analysis of domain entropy also reach similar conclusions. For example, an analysis outlined in a blog posting by Tiberium[5] states that the use of an entropy threshold of >3.1 (as an indicator of potential concern) correctly classifies 80% of NCSC malicious domains, and incorrectly classifies only 8% of the top 1000 most popular (legitimate!) domains overall (cf. Table 1).

Domain name
                                       
Entropy value
                           
  google.com 1.918
  youtube.com 2.522
  facebook.com 2.750
  twitter.com 2.128
  instagram.com 2.948
  baidu.com 2.322
  wikipedia.org 2.642
  yandex.ru 2.585
  yahoo.com 1.922
  whatsapp.com 2.500

Table 1: Entropy values of the SLDs of the top ten most popular websites according to Similarweb[6]

Additionally, an article published by Splunk[7] looking at the entropy values of fully qualified domain names, i.e. also including subdomain names - also states that high-entropy examples are consistent with the use of domain generation algorithms, and may be indicative of association with malware (e.g. in 'beaconing') and other web exploits. Comparable approaches and conclusions can also be found in a range of other studies[8,9,10], with some finding improvements in the reliability of threat determination through the use of alternative measures such as relative entropy (essentially, a comparison against the character distribution observed in a dataset of known legitimate domains, so as to provide a better measure of the randomness arising from automated algorithmic registrations)[11].

Conclusions

Domain-name entropy analysis has applications in at least two key areas of brand protection. The first of these is the ability to 'cluster' together related infringements, which has a number of benefits, including the ability to identify serial infringers and instances of bad-faith activity, for targeted and effective bulk enforcement actions. The second key area is as an input into algorithms to quantify the likely level of threat which may be posed by an online feature such as a new domain registration. Threat determination is essential in allowing prioritisation of results for analysis, enforcement, or content-change tracking.

All other factors being equal, there is some indication that high-threat domains - particularly those associated with automated registrations by domain-name generation algorithms - may have a tendency to sit at the higher-entropy end of the spectrum (and, furthermore, that domain names generated using a particular algorithm may be likely to have similar entropy values). This statement runs alongside the assertion that legitimate domains may (in general) be more likely to have lower entropy values, particularly where there is a desire for legitimate businesses to utilise strongly branded, short, memorable web addresses - as can be seen in many of the globally most popular websites.

References

[1] https://arxiv.org/ftp/arxiv/papers/1405/1405.2061.pdf

[2] https://interisle.net/sub/CriminalDomainAbuse.pdf

[3] https://www.linkedin.com/pulse/investigating-use-domain-name-entropy-clustering-results-barnett/

[4] https://www.linkedin.com/pulse/entropy-analysis-registered-domain-names-relating-top-david-barnett/

[5] https://www.tiberium.io/blog/chapter-2-classifying-domains-through-string-entropy/

[6] https://www.similarweb.com/top-websites/

[7] https://www.splunk.com/en_us/blog/security/random-words-on-entropy-and-dns.html

[8] https://hurricanelabs.com/blog/dns-entropy-hunting-and-you/

[9] https://www.logpoint.com/en/blog/embracing-randomness-to-detect-threats-through-entropy/

[10] https://suleman-qutb.medium.com/use-of-shannon-entropy-estimation-for-dga-detection-9ded275795ca

[11] https://redcanary.com/blog/threat-hunting-entropy/

This article was first published on 3 July 2023 at:

https://circleid.com/posts/20230703-an-overview-of-the-concept-and-use-of-domain-name-entropy

Thursday, 25 May 2023

The 'Millennium Problems' in Brand Protection

As the brand protection industry approaches a quarter of a century in age, following the founding of pioneers Envisional[1] and MarkMonitor[2] in 1999, I present an overview of some of the main outstanding issues which are frequently unaddressed or are generally only partially solved by brand protection service providers. I term these the 'Millennium Problems' in reference to the set of unsolved mathematical problems published in 2000 by the Clay Mathematics Institute[3], and for which significant prizes were offered for solutions. Like their mathematical counterparts, the unsolved problems in brand protection will present significant benefits for any service providers able to develop and offer comprehensive solutions.

Brand protection basics

In their most basic sense, brand protection solutions generally consist of two components: monitoring (or, strictly, detection) of brand-related content on the Internet, and enforcement action to achieve the removal of infringing material. Monitoring is most usually carried out using technological solutions intended to identify relevant material on the Internet, across a range of relevant channels, typically using a combination of methodologies, namely: (i) Internet metasearching (i.e. the submission of relevant query terms to search engines) and web crawling; (ii) analysis of domain-name zone files (see Problem 2), to identify domains with names including brand-related terms (or variants); (iii) direct monitoring / searching on known sites of interest (see Problem 1); and (iv) other techniques, such as the use of spam traps and webserver logs, as used in phishing detection technologies[4]. Many service providers will also make use of automated analysis tools, which can inspect the content of the identified webpages, and categorise and prioritise these results accordingly.

The 'Millennium Problems'

1. Social media monitoring

Whilst monitoring of content across social media platforms is a well-established element of many brand-protection service providers' product suites, it frequently remains extremely difficult to achieve anything approaching a comprehensive level of coverage. There are a number of reasons why this is the case. In general, social media content is most usually addressed using the 'direct site searching' approach (that is, using the search functionality typically in-built to the platforms themselves as a means of returning results), though some providers also have access to direct data feeds from the platforms (e.g. through an API). In general, a variety of types of content may be of interest, including brand references in usernames (e.g. associated with fake profiles), and the content of postings (e.g. associated with fraud, the sale of counterfeits, the spread of malware, brand disparagement, etc.) and elsewhere (including imagery, sponsored advertisements, and so on).

The main difficulty with the 'direct search' approach is that results presented to a user are often limited (sometimes significantly) unless the user is logged in to the social media platform. This can be circumvented by configuring a brand-protection monitoring tool to present itself to the platform as if it is a real user (with a registered account, handle (username) and password), or simply through the use of manual searches. Both of these approaches typically require the use of 'dummy' accounts and may be in contravention of the terms and conditions of the platforms themselves.

Other technological issues may also be problematic. Many social media platforms return results on an 'infinite scroll' basis (where additional results are continually added to the webpage as the user continues to scroll down through them), often with no indication of the total numbers of results which may be present, and many platforms also have specific access requirements, such as functionality only to be accessed via a mobile app (see Problem 7). Similarly, monitoring can be further complicated by sites where content is protected via the requirement to enter a CAPTCHA code, for example. It is also typically the case that the exact results returned to a user will be highly personalised, and dependent on their browsing history, interests, location, and personal demographic.

Some of these issues can be addressed through the development of partner relationships by brand-protection service providers with the platforms themselves. However, even in cases where the platforms are amenable to this approach, some of the above technological issues may remain difficult to address.

2. Comprehensive ccTLD monitoring

Another of the core elements of many brand protection service offerings is often a domain monitoring capability; that is, the ability to identify domains whose names include the name of the brand being infringed (and/or other relevant keywords). As a special subset of general Internet content, branded domain names are often of particular interest by virtue of their greater visibility (e.g. higher ranking in search-engine results) and the more explicit nature of the IP abuse (and an associated greater range of enforcement options)[5]. Branded domain names have been noted in many previous studies as being popular with bad actors in the creation of infringing content of a variety of types, including phishing sites[6], sites offering the sale of counterfeits, and sites claiming false affiliation or including disparaging content.

The primary source of data for domain monitoring is usually the analysis of zone files, which are data files published by the registry organisations responsible for overseeing the infrastructure of each individual TLD (top-level domain, or domain extension - such as .com), and which contain a list of all existing registered domains across that extension. By comparing the content of a zone file with that from the previous day, it is possible to identify new domain registrations (as well as dropped, or lapsed, domains) and filter this list for those examples containing a brand name or keyword of interest. Domain monitoring solutions can (and, in general, should) also make use of zone-file analysis to allow identification of the full pre-existing 'landscape' of registered domain names of interest, across the TLDs in question, at the commencement of monitoring (so-called 'baseline' analysis). The most sophisticated domain monitoring solutions can also automatically check for variations of the brand strings (such as typos), which are frequently used by infringers to construct deliberately deceptive domain names[7,8].

Zone files are generally available for most gTLDs (generic, or global, TLDs such as .com, .net, etc.) plus the new-gTLDs which have been launched in the period since 2012[9], but are often not published (or may not be comprehensive) by the registry organisations responsible for other TLDs, particularly the country-specific examples (ccTLDs). For this reason, detection of relevant domains across ccTLD extensions is typically incomplete, and a number of techniques may typically be used in order to fill in the gaps. These might include parallel look-ups (checks for domains with the same second-level domain name - i.e. the part of the domain name to the left of the dot - as examples identified through zone-file analysis), exact-match queries (regular searches for the existence of domains with second-level domain name strings of particular relevance, such as a brand name), and Internet metasearching. However, each of these approaches has its own limitations and, even when all taken together, there can always be domain names of potential concern which are not detected through any of these methods. The next generation of domain monitoring solutions will need to better address these shortcomings, potentially involving factors such as the use of improved algorithms to 'guess' candidate domain names for checking, and/or the use of more comprehensive indexes of Internet content. Additionally, the building of specific relationships with country registries - potentially combined with regulatory changes regarding the availability of zone files - may also be relevant.

3. Third-party subdomain monitoring

The subdomain is the section of a URL prior to the domain name, from which it is separated by a dot (e.g. 'translate' in 'translate.google.com'). The owner of a domain name can create whatever subdomains they wish, and can point these URLs to associated web content (via the configuration of DNS settings). Accordingly, subdomains can be used to create brand-related URLs, and can be associated with many of the same types of infringements as domain names themselves[10]. Subdomain-based abuse can also be particularly attractive to infringers, both because it avoids the requirement to register a brand-specific domain name[11] (which bad actors know can easily be detected by brand owners employing domain-monitoring services) and because there can be a low cost associated with the creation of the URL, particularly where a service provider allowing the free registration of personalised subdomains (such as blogspot.com) is used.

Consequently, the ability to monitor generally for brand references in the subdomain name of arbitrary URLs can be of great value. Note that this is distinct from the (relatively much simpler) problem of monitoring the existence and content of subdomains of official domains under the ownership of the brand owner 'internal' subdomain monitoring), since all of the relevant information is contained in the DNS configuration files held by the brand owner's domain-name management service provider.

Conversely, the identification of brand-related subdomains on third-party ('external') domain names is much more difficult. In many cases, this is achieved purely using Internet metasearching techniques (i.e. finding only content which is indexed by search engines in response to brand-specific query terms). Whilst this does mimic the search techniques used by general Internet users (and thereby identify the 'highest-visibility' content), it will in general not find all potentially threatening content (e.g. URLs to which traffic is driven through other means, such as links in spam e-mails). This problem can be mediated to some degree through the use of other techniques, such as passive DNS analysis or certificate transparency (CT) analysis, or via explicit queries for the existence of specific subdomain names of interest. However, these techniques require prior identification of the specific domains to be monitored; generalised identification of brand-related subdomains remains a much harder problem to solve.

4. Circumventing site blocking and geoblocking

Site blocking and geoblocking are two long-established problems in brand monitoring. The former arises when a monitored site becomes aware of repeated search queries from a particular source, and restricts access to the site from the IP address in question. A site owner may choose to do this for a number of reasons, including protection of website performance (e.g. in preventing DDoS attacks), or for compliance with their own terms and conditions (e.g. where they state that information is not to be collected for commercial purposes, such as by brand-protection service providers). Geoblocking (or geotargeting) is a related issue, whereby the visible content of a website may vary depending on the geographical location of the visitor. Again, this may be implemented by a site owner for a range of reasons, including the tailoring of content to a local audience, search-engine optimisation, security, or legal compliance[12]. However, geoblocking can also be employed by infringers as a means of evading detection, and can also present difficulties in enforcement, where it may be necessary to demonstrate exactly what content is visible from a specific remote location.

The solutions to these issues, from a brand-protection point of view, are relatively simple in principle, generally involving the use of proxies (standalone external machines serving as intermediate 'hops' through which search queries from a brand-protection service provider are routed, so as to 'mask' the originating IP address) in a range of remote locations, and/or (particularly for site blocking) the building of relationships with the sites being monitored, so that the monitoring service provider can gain permission for collecting the data. However, in practice this requires a great deal of investment in building the required infrastructure (such as hosting and maintaining the necessary proxies, and configuring the monitoring software to communicate with them) and establishing the necessary relationships. Furthermore, the construction of appropriate user interfaces to visualise and interpret the relevant information (such as the ability to compare the content of a particular website across a range of different user (i.e. proxy) locations, in cases where geoblocking or geotargeting may be an issue) can also be a complex prospect.

5. Clustering and open-source intelligence analysis

The subject areas of clustering and open-source intelligence (OSINT) are generally of greatest relevance for entity investigations, i.e. the process of using Internet searches to build a portfolio of information relating to an identified individual or website of interest. Such information can be used for a range of purposes, including background for on-the-ground investigations or goods seizures, or for legal cases, but can also be useful background for enforcement actions (e.g. in identifying clusters of related infringements for efficient bulk takedowns in a single action).

A number of technological solutions exist for visualising the links behind related entities, on the basis of common shared characteristics (such as e-mail addresses, telephone numbers, web-hosting information such as IP addresses, and so on) - i.e. 'clustering', but it is often the case that the characteristics themselves require identification through manual analysis processes. A great deal of additional efficiency can be built into the process, however, through the use of monitoring and analysis tools which can identify and extract this information automatically. This is relatively more straightforward in cases where the data can be extracted in a consistent manner (e.g. performing an IP-address look-up for any identified website of interest), and/or where the information is contained in a known location on a webpage with a fixed, pre-defined format (the 'contact details' section of a social-media profile page), such that a web scraper can be configured to pull out the content. It is a considerably more difficult enterprise to extract such information from general webpages where the structure of each page is not known in advance. In these cases, the approach generally needs to be based on the configuration of monitoring tools which are able to extract text-strings with the general format of (say) an e-mail address or telephone number. This then typically requires an element of post-processing to 'clean' and standardise the data. The next generation of clustering tools are likely to make extensive use of artificial intelligence in order to do this, in addition to also then drawing out insights between the clusters thus produced.

6. Dark Web monitoring

Dark Web content is the general name given to online material for which there are special access requirements; however in the context of online brand monitoring, it is usually taken to refer to content which is only accessible via the Tor network (a decentralised network involving the use of encrypted communications, and connections via multiple hops between Tor servers (proxies) - also known as relays or nodes). The Tor network - which is accessed using specially enabled browsers - can be used to view regular ('surface web') Internet content (and is one option open to users for whom anonymity is important), but is more usually used to access websites with the .onion extension, i.e. those which are only accessible from within the network[13].

The Tor network of .onion websites includes a range of different content types, but is notorious for illegal and infringing content and, as such, can be a key area of interest for brand monitoring. However, many brand protection service providers offer only limited capabilities in this area. This is for a number of different reasons. One significant factor is that the Dark Web is essentially unregulated, frequently with no available links to 'real-world' contact details, and extremely limited enforcement options against infringing content. However, even in cases where takedown is not possible, intelligence on the content can be extremely valuable - one example may be on 'carder' websites, on which stolen financial credentials are traded; if (say) a financial services company can determine that the details for a particular credit card or bank account are being offered for sale, this provides the opportunity for the account to be 'locked' or deactivated.

It can also be extremely difficult to configure monitoring software to search the Dark Web. Whilst it is technically relatively straightforward to configure systems to be Tor-enabled (although connections are typically rather slow), there are generally no robust indexes of Dark Web content (such as the search engines and zone files used to search surface-web content), not least because the .onion addresses for any given website - which usually consist of long, random alphanumeric strings - are generally short-lived and change over time. A number of Dark Web search engines do exist, together with ad-hoc indexes of Dark Web content posted by users on sites such as Pastebin, but the information on these sources typically becomes out-of-date rather quickly.

The nature of the content on the Dark Web also means that security concerns can be an issue for brand-protection service providers wishing to build their capabilities in this area.

7. Mobile-based technologies

As Internet engagement has continued to grow over recent years, an increasing proportion of Internet use is conducted over mobile devices[14,15], using a wide ecosystem of mobile apps. Many platforms are now almost exclusively mobile-based, often with little or no corresponding web presence - popular examples might include the WeChat / Weixin platforms, public groups on messaging services such as WhatsApp, and e-commerce platforms such as Pinduoduo. Many brand-protection service providers use legacy monitoring technologies which were designed specifically for analysing HTML content on the regular Internet and are often poorly equipped to address mobile technologies. In some cases, the work-around is to make use of standalone mobile devices or emulators - on which significant proportions of the monitoring is conducted manually - and there typically remains significant work to be done in order to fully integrate the relevant technologies into core monitoring capabilities.

8. Addressing the Web3 landscape

Web3 (also known as 'Web 3.0') is a general term referring to decentralised content on the Internet, with a particular focus on blockchain technologies. Blockchains are publicly accessible digital ledgers in which transactions are recorded, and form the basis of many digital currencies (or 'cryptocurrencies') (such as Bitcoin), in addition to a number of other applications, such as supply-chain control by brand owners. From a brand-protection viewpoint, the main related areas of interest are typically NFTs and blockchain domains[16].

NFTs (non-fungible tokens) are digital files whose ownership is recorded on a blockchain. They are most commonly associated with graphics files (such as artworks and branded imagery) or other types of digital content (such as audio or music files). However, brand owners are increasingly incorporating NFTs into their business models, including areas such as the production and trade of virtual branded items (e.g. items to be worn by avatars in virtual-reality environments within the 'metaverse', the name given to a generalised connected environment of 3D virtual worlds). Consequently, unofficial branded NFTs can be a source of concern for brand owners.

Blockchain domains - which are recorded (together with their ownership details) on a blockchain, rather than using traditional registrars and web hosting - have a number of similarities to 'classic' domain names, and can be utilised in a number of ways. The most common uses are the creation of decentralised websites on peer-to-peer (P2P) platforms, to be accessed via specially-enabled browsers, or as addresses for sending and receiving cryptocurrency. However, the blockchain domain ecosystem is essentially unregulated, and nothing analogous to domain-name zone files is available. The system is made additionally more complicated by the fact the infrastructure allows for the possibility of domain-name 'clashes' - i.e. the potential for the same name to exist independently on distinct blockchains. As with traditional domain names, blockchain domains with brand-specific names can be threat to brand owners, and a potential source of confusion for customers.

Both NFTs and blockchain domains can be traded on NFT marketplaces (such as OpenSea), and the monitoring of these sites is typically the primary source of intelligence utilised by those brand-protection service providers offering capabilities in this area. For blockchain domains particularly, this approach is less than satisfactory, and offers nothing approaching the sort of comprehensive coverage as is available for regular gTLD domain names via zone-file analysis. Some additional information on the existence of registered blockchain domains is typically available through direct searches within databases provided by blockchain domain registrars and nameserver providers; however, the problem of more comprehensive detection is much more difficult to solve, potentially involving analysis of the content of the individual blockchains directly.

Another difficulty to be overcome in service offerings relating to NFTs and blockchain domains is the issue of enforcement against infringing content. In some cases, enforcement can be carried out through the submission of a DMCA (Digital Millennium Copyright Act) notice, and some NFT marketplaces have specific takedown procedures for content which infringes protected IP. However, in many cases, this simply involves the item being 'delisted' from the marketplace in question. In the future, we may see a move towards more rigorous enforcement, potentially involving forced transfers of ownership. Part of the problem is that the legal issues surrounding NFTs and blockchain domains are, in many cases, still not well-defined and are rapidly evolving, complicated by factors such as the fact that ownership of an NFT ownership does not necessarily grant ownership of copyright for the embedded content.

Beyond #8: Other emerging technologies

As new Internet technologies continue to emerge and develop, they will bring with them new risks for brand owners and associated challenges for brand-protection service providers, who will need to continue to observe and innovate in order to stay ahead of the curve.

At any given time, it is unclear where the next area of concern will come from. Currently, there is a great deal of buzz and speculation about artificial intelligence (AI) technologies and chatbots such as ChatGPT, but it is less obvious how these may affect brand-protection considerations. In this context, I am referring to content associated with, or produced by, AI applications. (Conversely, however, it seems highly likely that AI capabilities will be increasingly built into technologies used to facilitate the brand-protection process - i.e. tools to assist with monitoring, prioritisation, clustering and enforcement.)

Users are able to communicate with AI technologies such as ChatGPT via natural language, which are then able to construct responses based on information with which they have been 'trained'. This means that the information available from a chatbot is only as good as the data with which it has been trained (essentially, in the case of ChatGPT, including large volumes of Internet databases[17,18]), and should really be treated with at least as much caution as the old "I'm Feeling Lucky" button on Google, where the user is just presented with a single response (not necessarily the most reliable one!) to any given query. This point is all the more valid given the ability of chatbots to extrapolate, and provide responses based on incomplete information. What this all means is that chatbots pose the risk of providing information about (say) a company or brand which is misleading or otherwise damaging to corporate reputation. However, since responses are generated dynamically in response to queries (rather than being 'fixed', as in the content of an HTML webpage), it is not clear how these issues might be addressed from a brand-protection point of view. Further complications surround issues such as the ownership of rights to content produced by AI technologies[19].

Where chatbots may be of particular concern from a brand-protection and cybersecurity point of view is in their ability to rapidly create content of a wide variety of types, in a range of different styles - including the ability to write and de-bug computer code. What this may mean is that the entry barrier for infringers wishing to create compelling phishing e-mails[20], or write malicious programs ('malware')[21] may be significantly diminished. The likelihood is - at least in the first generations of AI technologies - that AI will not so much change the types of attack which are possible, but rather the ease with which they can be executed[22].

Another issue surrounds use-cases in which AI systems are 'trained' with confidential corporate information as part of the process of creation of company materials (such as marketing releases). These scenarios raise the possibility for the information to be accessed by third parties, either directly via hacking, or via content included in the responses provided to other users, depending on the ways in which information is 'shared' within the infrastructure of the AI technology itself[23]

References

[1] https://www.cst.cam.ac.uk/ring/halloffame

[2] https://www.markmonitor.com/download/ds/MarkMonitor-Corporate-Overview.pdf

[3] https://www.claymath.org/millennium-problems

[4] https://www.linkedin.com/pulse/assessing-mediating-digital-risk-landscape-brand-david-barnett/

[5] https://www.worldtrademarkreview.com/global-guide/anti-counterfeiting-and-online-brand-enforcement/2022/article/creating-cost-effective-domain-name-watching-programme

[6] https://www.cscdbs.com/blog/branded-domains-are-the-focal-point-of-many-phishing-attacks/

[7] https://www.cscdbs.com/en/resources-news/threatening-domains-targeting-top-brands/

[8] https://www.linkedin.com/pulse/hyphenated-domain-infringements-david-barnett/

[9] https://newgtlds.icann.org/en/about/program

[10] https://www.cscdbs.com/blog/the-world-of-the-subdomain/

[11] https://www.linkedin.com/pulse/exploring-domain-hostname-based-infringements-david-barnett/

[12] https://www.cscdbs.com/blog/do-you-see-what-i-see-geotargeting-in-brand-infringements/

[13] 'Brand Protection in the Online World: A Comprehensive Guide' by David Barnett (2016). Chapter 11: ''Deep' and 'Dark' Web'

[14] https://www.statista.com/statistics/617136/digital-population-worldwide/

[15] https://www.linkedin.com/pulse/holistic-brand-fraud-cyber-protection-using-domain-threat-barnett/

[16] https://www.linkedin.com/pulse/rise-nft-david-barnett

[17] https://www.sciencefocus.com/future-technology/gpt-3/

[18] https://techcrunch.com/2023/03/23/openai-connects-chatgpt-to-the-internet/

[19] https://intellectual-property-helpdesk.ec.europa.eu/news-events/news/intellectual-property-chatgpt-2023-02-20_en

[20] https://securityboulevard.com/2023/01/what-does-chat-gpt-imply-for-brand-impersonation-qa-with-dr-salvatore-stolfo/

[21] https://www.digitaltrends.com/computing/chatgpt-created-malware/

[22] https://venturebeat.com/security/security-risks-evolve-with-release-of-gpt-4/

[23] https://blogs.blackberry.com/en/2023/04/is-chatgpt-safe-for-organizations-to-use

This article was first published on 25 May 2023 at:

https://circleid.com/posts/20230525-the-millennium-problems-in-brand-protection

Friday, 3 March 2023

Developing a methodology for benchmarking marketplace brand infringements

Introduction

One of the primary aims of a brand-protection programme is typically the ability to determine the extent of brand infringements on e-commerce marketplaces - and ideally, to be able to benchmark this metric against comparable competitor brands. In this article, I discuss a simple initial possible methodology for quantifying this characteristic, based on the price point of the items in the listings returned in response to a brand-specific search (with a low price point typically indicating that a listing may be of interest). 

The methodology considers the first page of results returned on any given marketplace, in response to a relevant search, and attempts to quantify the proportion of infringing listings within this dataset - a concept which is familiar from other areas in which metrics for measuring infringements or brand-protection effectiveness are required (e.g. where one aim of a brand-protection programme might be to 'clean up' the first page of results, so that only legitimate products or sellers are returned, and no infringing products are present). 

On marketplaces, listings of potential interest can typically fall into a range of categories, including counterfeit goods, trademark infringements, compatible items, 'grey-market' trade (i.e. legitimate goods sold outside approved channels), legitimate second-hand goods, and so on. Attempting to quantify the overall level of infringements based purely on price point will always therefore have shortcomings, and it may also be necessary to apply some degree of 'filtering' in order to obtain meaningful results and compare like with like. Of course, in practice, any definitive determination of infringement type will always require more detailed manual analysis for each listing (potentially also combined with other factors such as test purchases). However, in this article, I consider a high-level approach which may be at least partially automatable.

Since a simple search for just a brand name will be likely to return a mix of product types (with an associated range of prices, even for the legitimate items), I take the approach of considering one or more specific products for each brand, each of which will have a single, well-defined price for the legitimate item.

Exploring a test case

In this investigation, I consider the iPhone 14 Pro - an example of a relatively new, high-desirability product of a type which is typically prone to counterfeits and other infringement issues. I consider the listings returned on the first page of results of a specific marketplace on 01-Mar-2023, approximately six months after the initial official release of the product[1]

Since one of the aims of the analysis is to be able to benchmark against other brands and products, it will be beneficial to compare the marketplace listing prices against the actual price of the genuine product, rather than just considering the absolute numbers. This can be achieved by expressing the price per item in the marketplace listing as a proportion of the genuine item list price ($999 in the case of the iPhone 14 Pro[2]) - a measure I refer to as 'relative price'.

Where appropriate, it may also be necessary to apply a product-type filter to the results provided by the marketplace - for example, a simple search for 'iPhone 14 Pro' may return a mix of product types (including phones, accessories, and so on); however, setting the product type filter to 'mobile phones' specifically will (in theory) only return listings for phones themselves, so the spread of prices across the listings will be more reflective of the types of infringements present across the dataset of mobile-phone results. 

Even then, a low price point (say) is, in itself, not necessarily indicative that a listing represents a counterfeit product. Other types of listings (such as legitimate second hand-products and trademark infringements - e.g. where a brand name is used in the listing title so as to attract search traffic to the listing, but the listing itself is for a third-party branded product) may also be associated with low prices. The first possibility can be mediated to some degree by considering only specific marketplaces where the extent of second-hand trade is limited (e.g. B2B marketplaces)[3]; conversely, separating out (say) counterfeits from other infringement types is much more difficult using only a largely-automated, price-based approach. However, the argument can be made that all such listings (with low price points) are likely to be infringing in some way - all we are therefore looking to do is quantify the overall size of this general infringement landscape.

As an illustration of the results, shown below (Table 1) is an overview of the top ten results returned in response to a listing for 'iPhone 14 Pro' on the marketplace in question, with the product filter set to 'mobile phones'.

Title

                                                                
Price per item
(min. listed) ($)

                          
Min. order quantity Quantity
(max. listed)

                     
Brand name
                         
Relative price
  Wholesale mobile phone Original Smart
  5G Mobile Cell Phones for iphone 11
  128GB
50.00 2 300   for Apple 0.050
  New Arrival Original Brand Phone 11pro
  max 12 mini Waterproof Face
  Recognition 256gb 512gb 1TB Game
  Mobile Phone for iPhone 13
399.00 2 50   original 0.399
  Smartphone mobile iphone 11ProMax
  256gb 5g usa spec original no scratches
  body low price for wholesale 6.5inch
  screen game phone
497.00 1 1   Other 0.497
  Hot Selling PHONE 14 PRO MAX 12GB+
  512GB 6.7 Inch full Display Android
  10.0 Mobile Phone I13 PRO MAX Cell
  Phone Smartphone
27.72 1 99,999   Android
  Smartphone
0.028
  Low price wholesale smartphone 14
  Pro Max 8GB+256GB 7.3in 8core 4G
  LET global Edition smartphone
72.00 1 1,000   W&O 0.072
  New Global I 14 Pro Max Cell Phone 7.3
  Inch Big Screen 5G Smartphone 16GB +
  1TB Global Unlock Dual SIM Android
  Mobile Phone
95.00 1 10,000   Other 0.095
  Free shipping phone I13 pro max 8GB+
  256GB 6.7 Inch full Display Android 10.0
  Mobile Phone PHONE13 PRO MAX Cell
  Phone Smartphone
66.00 1 99,999   Smartphone
  S22
0.066
  i13 Pro cash on delivery mobile phone 8+
  16MP New Original Unlocked Smartphone
  6.8" Display OEM
66.00 1 99,999   Android
  Smartphone
0.066
  High Quality i 14 Pro Max 5G 6.8 Inch
  Original Mobile Phone 16GB+1TB Large
  Memory Smart Phone Beauty Camera
  Gaming Cellphone
32.00 1 20,000   Other 0.032
  High Quality i 14 Pro Max 5G 6.8 Inch
  Original Mobile Phone 16GB+1TB Large
  Memory Smart Phone Beauty Camera
  Gaming Cellphone
55.10 1 20,000   Other 0.055

Table 1: Details of top ten listings returned in response to a search for 'iPhone 14 Pro', with the product filter set to 'mobile phones'

Notes:

  • The 'price per item' is given as the lowest price referenced in the listing, in cases where the unit price may vary dependent on the quantity offered.
  • The 'quantity (max. listed)' value is given as either the maximum quantity stated as being available, or the maximum quantity for which a unit price is specified in the listing (whichever is greater).

From the total set of (48) listings returned on the first page of results, a number of other observations are particularly noteworthy:

  1. None of the listings contains what would normally be described as a counterfeit product; none shows Apple branding in the product image, and none cites the product brand name as Apple (with the exception of the first listing, in which the product is stated as 'for Apple'; this is a technique commonly used by sellers to describe compatible products - although this would be largely non-sensical for a smartphone listing - or as a means of circumventing enforcement efforts), though some listings do give the brand as 'original' or 'OEM'. However, many of the listings in the dataset would constitute trademark infringements, with brand names given in several cases as 'other', 'Android smartphone', or a brand name referring specifically to the seller in question.

  2. Several of the returned listings appear not to be infringing the iPhone 14 Pro product in any way, as the marketplace seems to also return a number of listings referring only to one or more of the individual keywords in the search phrase. Only a subset of the results (those referring explicitly to '14pro' or 'phone14' (both with or without spaces) - shown in bold text in Table 1) are likely to be directly infringing. Accordingly, when carrying out the price-point analysis, it will be beneficial to apply some filtering in order to exclude all except these listings. 

  3. It is also informative that a number of the relevant listings do make reference to 'i 14 pro' or 'I14 pro' - presumably as a way of avoiding directly infringing the iPhone brand name, and potentially also aiming to circumvent detection. Use of brand variations of this type is popular with infringers.

  4. Amongst the listings, a range of maximum quantities (per listing) was observed, from 1 to 10,000,000.

  5. All except one of the listings are for sellers based in China, with a significant number operating out of the manufacturing centres of Shenzhen, a trend which has frequently been observed for sellers of infringing products.

For the 48 listings, the distribution of relative price per item is as shown in Figure 1.

Figure 1: Distribution of relative price per item, for the full set of 48 listings returned on the first page of results in response to a search for 'iPhone 14 Pro', with the product filter set to 'mobile phones'

The results show strikingly that the listings are dominated by products at a very low price point, with the vast majority of items at 10% of list price or lower (i.e. ≤ 0.10 relative price). 

It is instructive to consider some examples of the listings with the lowest price point (excluding the non-'14pro' and non-'phone14' results, as discussed in point (2) above), to analyse the types of infringement present. Of the five listings with the lowest relative prices, for example, all are offering high quantities of items (up to between 10,000 and 99,999), and all appear to represent trademark infringements (or potentially to be involved in the supply chain for counterfeit products) (Figure 2).

Figure 2: Examples of listings with very low price points, both offering 'customized logo' and 'customized packaging' for bulk orders

For developing this methodology further, it is advantageous to express the number of listings in each relative price 'bin' as a proportion of the total, rather than as an absolute number. This has a couple of advantages, specifically:

  • It allows for easier comparison across different marketplaces, where the number of results returned by page may differ.
  • It allows filtering of results to remove any 'false positives' (as discussed in point (2) above).

It also simplifies the calculations if the bins are a consistent width throughout (in this case, 0.02).

This therefore gives the results for the iPhone 14 Pro search for the marketplace in question in the format shown in Figure 3 below (in which the 20 non-relevant listings have been excluded).

Figure 3: Distribution of relative price per item, for the set of listings returned on the first page of results in response to a search for 'iPhone 14 Pro', with the product filter set to 'mobile phones', and with non-relevant / non-infringing listings excluded

In order to carry out the benchmarking across different brands or marketplaces, it is also useful to construct a single metric (or value) which provides a measure of the distribution of relative price points across the set of listings. Essentially, we would like this number to represent the proportion of the 'area under the graph' at the low-price-point end of the relative price distribution chart. 

This can be achieved by summing up the heights of the individual columns, but weighting more heavily (i.e. applying a larger multiplying factor(s) to the heights of) the columns at the low-price end. The weightings can be selected in a number of different ways; one possible methodology is to calculate a weighting which is inversely proportional to the relative price value at the mid-point of the bin in question (such that, for example, the height of the column for the bin associated with a price-point range of 0.00 to 0.02 - i.e. with a mid-point of 0.01 - is weighted by a factor of (1/0.01), or 100).

This methodology allows us to calculate a single price-point metric (P)[4], whose value increases according to the proportion of the listings in the sample associated with lower price points (and therefore provides a measure of the potential scale of the infringement landscape). In the case where all listings have a relative price of 1.00 (i.e. potentially just a set of legitimate product listings), the value of P will be 1[5]. In this case, for the distribution of iPhone 14 Pro listing price-points shown in Figure 3, the value of P is 19.180.

The approach thereby allows us to benchmark the product against other products, brands, or marketplaces. For example, considering the same marketplace, but looking instead at the comparable Galaxy S23 Ultra product (RRP = $1199.99)[6], and similarly applying filtering to remove non-relevant listings, we obtain the price distribution as shown in Figure 4.

Figure 4: Distribution of relative price per item, for the set of listings returned on the first page of results in response to a search for 'Galaxy S23 Ultra', with the product filter set to 'smart phones', and with non-relevant / non-infringing listings excluded

In this case, the price-point metric value (P) is 21.643, indicating a greater proportion of listings at the lowest price points than for the iPhone product on the same marketplace, and potentially therefore a larger infringement landscape. This is consistent with what we can subjectively see in Figure 4, with a greater peak in the lowest occupied relative-price bin (between values of 0.02 and 0.04).

Conclusion

Whilst taking a very simple-minded approach, the methodology discussed above does provide a basic measure of the proportion of listings in a set of marketplace results which show low price points and, by extension, a measure of the potential scale of the infringement landscape. Obviously this assertion is only valid if we accept low price point as a proxy for a listing to be of interest, but previous analysis has certainly shown that it is at least one such valid indicator (and as also borne out by the examples presented in this article).

In practice, calculation of the price-point metric could be automatable, based on collection of marketplace data using monitoring tools, combined with (a) scraping technology to automatically extract the price information and (b) filtering technology to remove false positives in the results.

By applying and expanding these ideas, it would be possible to carry out cross-brand and cross-marketplace benchmarking, and potentially to track trends in the infringement landscape over time (e.g. in conjunction with an active brand-protection programme of monitoring and enforcement).

Acknowledgements

Thanks must go to Angharad Baber, Irene Oh, Agnes Czolnowska and David Franklin for their feedback and input into this article.

References

[1] https://www.apple.com/newsroom/2022/09/apple-introduces-iphone-14-and-iphone-14-plus/

[2] https://www.apple.com/iphone/

[3] Similarly, data will be less likely to be meaningful on (for example) auction-based marketplaces, particularly in the early stages of auctions when the price point is likely to be low by definition.

[4] Formally, PSi [ (1/mi) × â„“i ], where mi is the relative price at the mid-point of the ith bin, and â„“i is the proportion of listings in that bin.

[5] Approximately(!), depending on where the price-bin boundaries are selected to be.

[6] https://www.samsung.com/us/smartphones/galaxy-s23-ultra/buy/galaxy-s23-ultra-256gb-t-mobile-sm-s918uzgaxau/

This article was first published on 3 March 2023 at:

https://www.linkedin.com/pulse/developing-methodology-benchmarking-marketplace-brand-david-barnett/

Thursday, 9 February 2023

Calculation of return on investment for brand-protection programmes: Thoughts towards a new paradigm

Pre-existing ideas

Numerous previous studies have considered methodologies for calculating the return on investment (ROI) of brand-protection programmes which incorporate components of monitoring and enforcement. These ideas can be important both to justify the spend on a programme in the first place, and to assess its impact once established. Correspondingly, 'classic' ROI calculations can be categorised into two main types: the first (known as 'a priori' calculations) consider the probable infringement landscape in advance of the implementation of a brand-protection programme; the second aims to quantify the actions taken as part of an active enforcement initiative[1]. It is the latter category with which we are primarily concerned in this article.

To a very high level, many ROI calculation methodologies use a formulation along the lines of:

R = C × E

where R, the ROI (within a given timeframe) (i.e. the benefit of the brand-protection programme, to be offset against the associated spend) is equal to the product of C, the 'cost' of a pre-existing infringement being active, and E, the number of infringements removed through enforcement as part of the brand-protection programme (in the same timeframe).

Very many assumptions are typically required in order to estimate these figures. In some methodologies, the assumed 'cost' associated with a live infringement may be reflective of an estimate of its direct financial impact (e.g. the typical loss from a phishing incident); in others it may be calculated as the proportion of lost revenue which is reclaimable following deactivation of the infringement (i.e. the 'cost' in the above formulation essentially reflecting the pre-enforcement impact of not yet having taken the infringement down). In these types of approaches, it is very rare that these figures can be measured directly and therefore a number of assumptions (or 'proxies' for the data) are required. In cases of domain acquisition, for example, it may be appropriate to make use of figures such as web traffic when quantifying impact; for marketplace listings, it is typically necessary to consider factors such as price and quantity of items in the listings removed. In both cases, the methodology needs to consider assumed conversion rates (i.e. the proportion of customers who can be 'monetised' by the brand owner - e.g. those who will make a legitimate purchase once the source of infringements is removed)[2,3]. Even this part of the process is far from simple; complications include factors such as: 

  1. The conversion rate will be (strongly) dependent on the nature and price of the item (e.g. it will be much lower for (say) an obvious counterfeit, such as an item passing off as a high-end luxury brand but with a very low price point)[4].

  2. The conversion rate for customers knowingly navigating to an official brand website will potentially be different to that for those Internet users intending to visit a third-party standalone e-commerce site (if we are considering the case where this domain may subsequently have been acquired by the brand owner and its traffic re-directed to their official site) - this consideration involves taking account of a principle sometimes referred to as the 'substitution effect'[5]

Alternative proxies for the above figures may also need to be utilised, depending on the web channel under consideration (e.g. where absolute estimates of web traffic are not available or appropriate). For example, on social media, the 'exposure' or 'reach' of content can be estimated using numbers of 'likes' or followers; for mobile apps, the number of downloads may be relevant; for file sharing, it may be appropriate to consider the number of individuals accessing the content (e.g. 'seeds' and 'leechers' for BitTorrent). 

Numerous other approaches can also be taken. The ultimate objective when estimating the 'value' of a website is the identification a direct measure of the revenue it generates (e.g. via direct sales of products, for an e-commerce site). In practice, this information is almost never publicly available, though it is sometimes possible to make estimations via shipping or logistics information available through third-party databases. Some methodologies will utilise web-analytics tools to estimate value based on factors such as advertising spend by the site owner, or will analyse outgoing site traffic (e.g. to payment service provider platforms) to estimate customer volume and/or conversion rates[6]

It has also previously been noted that sometimes determination of ROI can reflect more qualitative goals (i.e. the statements of 'what success looks like' for a brand-protection programme). For example, a brand owner may consider a programme 'successful' once there are no infringing results returned on the first page of search-engine results, or in pages of search results on a range of key marketplace sites, in response to brand-specific queries. Similarly, the 'ownership of the buy button' (i.e. being the first vendor listed for a particular product on an e-commerce marketplace site) might be a key aim.

The success of a brand-protection initiative can also be judged based on other (again, more quantitative) metrics which may be only available to the brand owner themselves (as opposed to, say, a brand-protection service provider partner). These might include factors such as increases in the numbers of visitors to physical stores, or in volumes of traffic to official websites (as might be directly measurable using the brand owner's webserver log information).

Beyond this, wholly different methodologies can also be applied. Some will take account of 'intangible' factors such as brand value[7], considering the spend on brand protection to be a business cost necessary to lower the risk of damage to the brand. This type of approach is also not straightforward - higher levels of abuse can be considered an indicator that the brand is a desirable one, which can actually be reflective of greater brand value. Other factors, such as new product launches, can also affect the visibility of the brand and its likelihood of being targeted, all of which can serve to further complicate the landscape. 

However, in this article, we will primarily consider the simpler approaches discussed in previous work, and look at how they can potentially be modified to better account for the overall impact of a brand-protection programme.

Variations over time in the infringement landscape

Part 1: Single-brand analysis

In this section, we consider an extremely simplified model looking at changes in the infringement landscape over time for a brand, considering in the first instance the example of a newly-launched brand. In this case, the growth in the number of infringements over time might look something like that shown in Figure 1.

Figure 1: Mock-up of the changing infringement landscape over time for a newly-launched brand

The above framework is formulated using a timeframe expressed as numbers of months for convenience, though the timescales observed in practice may vary hugely. There is also a deliberate choice to avoid stating any quantitative numbers for the volumes of infringements, as these will also be dependent on any number of different factors - one brand may see tens or hundreds of infringements; other may see many thousands or more. Beyond these points, the construction of the above trend lines is based on the following scenario:

  • Following the launch of the brand (in month 1), there is a ramp-up in the number of monthly numbers of new infringements ('N') appearing online, up to a constant level.
  • There is also a (slower) ramp-up in the rate of infringements disappearing naturally from the Internet ('natural removal', 'R') even in the absence of any enforcement activity. This will arise through a combination of factors, including: content which is deactivated by the infringer following a period of use; domains expiring after their registration period; older content gradually dropping down search-engine rankings (and potentially therefore eventually ceasing to have any damaging impact), and so on.
  • There is a resulting growth in the cumulative number of active online infringements ('I'), caused by the difference between the monthly values of 'N' and 'R'. 
  • Finally, it seems reasonable to assume in most cases that 'I' will eventually reach a steady state, rather than continuing to grow indefinitely. This implies that 'R' will eventually 'catch up' with 'N' (possibly in part due to the fact that 'N' may also drop off slightly over time, after an initial peak in infringement activity). 

Of course, in practice the exact balance between the above numbers will be dependent on an enormous range of factors, including considerations such as the type of Internet channel. For example, marketplace listings will typically have a shorter 'lifetime' than domain registrations (affecting, for example, the rate at which 'R' catches up with 'N').

Let us now consider the case where a brand-protection programme, incorporating the introduction of enforcement actions for the removal of infringing content, is added into the picture (say, after the landscape has reached steady state in month 12) (Figure 2).

Figure 2: Mock-up of the changing infringement landscape over time, with an enforcement programme introduced in month 12

In this case, we use the following formulation:

  • In month 12, the enforcement programme is introduced, which incorporates a particular level of resource sufficient to action a certain maximum number of takedowns each month. This number will of course need to be greater than the rate at which new infringements appear, if the programme is to be successful. 
  • Following the introduction of the enforcement programme, the rate of natural removal ('R') of infringements will quickly drop off to zero (essentially, the infringements are being removed via enforcement quicker than the rate at which they would otherwise naturally disappear).
  • As enforcement progresses, the cumulative number of infringements drops off from its pre-existing level, until we reach a steady state (the 'whackamole' phase[8]) where the monthly number of enforcements ('E') simply needs to 'keep up' with the rate at which new infringements appear ('N'). In other words, each month a certain number of new infringements appear and these are all removed through the actions of the enforcement programme. (N.B. Equivalently, at this point we could express the 'cumulative number of infringements' ('I') as zero, depending on the point in the month at which we carry out the calculation (i.e. whether pre- or post-enforcement).)

In reality, the situation is likely to be far less straightforward, with a number of additional factors complicating the picture, including (but not limited to) the facts that:

  • The types of infringements actioned over time may change (potentially starting with higher-impact or easier takedowns).
  • Monitoring will inevitably start to uncover lower visibility and/or lower severity infringements once the initial high-visibility, high-impact infringements have been taken down.
  • The rate of appearance of online infringements may change in response to the enforcement programme (e.g. infringers turning their attention to easier targets).
  • The infringers may change their tactics in response to the enforcement programme (e.g. describing goods in different ways) - accordingly, both the monitoring approach and the enforcement methodologies may need to evolve in order to account for this.

Nevertheless, the above very simplistic picture does reflect some of the top-level trends typically seen in a brand-protection programme, with an initial period of 'cleaning up' the pre-existing backlog of infringements followed by a steady-state period of lower required activity, just keeping pace with new infringements as they appear. 

This being the case, we can look to this model to draw insights into how our classic ROI calculation methods could be augmented to provide a fuller picture. In many of the traditional approaches, monthly ROI calculation methodologies make use just of the total monthly numbers of enforcements carried out ('E'). Although the drop-off in the numbers of pre-existing infringements is reflected in the ROI calculations associated with the enforcements carried out during the 'ramp-down' phase itself, it is usually not reflected in the ongoing calculations during the subsequent 'whackamole' phase. Really, it may be preferable to make use of the difference between the ongoing number of infringements ('X') and that observed at the start of the programme ('Y'), if we are to fully assess the impact of the brand-protection programme. In other words, rather than using the number 'X' as the basis of our monthly ROI calculation, it might instead be better to use 'Y – X'. This number instead provides a measure of the value of the ongoing brand-protection programme - essentially, reflecting the difference in the ongoing number of infringements (with the associated 'cost' of them being live) compared with that which would have been observed if the programme were not in place. In practice, determination of these numbers will require the brand-protection initiative to incorporate a comprehensive programme of monitoring (as well as enforcement) throughout, incorporating a full landscape 'audit' at the outset.

Part 2: Benchmarking and the use of controls

To further complicate the situation, what the above approach fails to consider is any changes to the infringement landscape which would have occurred if the brand-protection programme were not being carried out. This is known as the 'attribution' issue in the physical sciences. Of course, once enforcement starts being carried out, we lose the ability to see what would have happened to the numbers of infringements if they were not being actively taken down. It is well established that external factors can significantly change the infringement landscape. For example, numerous previous studies show that real-world events can drive spikes in resulting infringement activity[9]

One way in which this problem can be addressed is via comparison with another 'control' brand of a similar type, operating in a similar industry area, but for which brand-protection activity is not being carried out. In practice, a brand owner can never be completely sure what any given competitor is doing, so a more realistic scenario is the use of analysis a group of industry peers, across which the infringement trends over time can be averaged to create a 'benchmark'. Of course, this requires active monitoring across all these brands, and so may be far from straightforward.

In this case, we may end up with a scenario such as that shown in Figure 3, where the control or benchmark brand (actually ideally an average of the data collected across multiple third-party brands) - which we have to assume reflects external drivers in infringement trends in the absence of enforcement initiatives - shows a change in the infringement landscape since the start of the programme for the brand being protected.

Figure 3: Mock-up of the changing infringement landscape over time, with an enforcement programme introduced for the customer brand in month 12, and compared against a (pre-existing, established) benchmark brand(s)

In the above example, the control brand shows a ramp-up in infringements during the period of the brand-protection programme, perhaps driven by an external event of some sort. Additionally, by using a benchmark comprising data from across numerous brands, we reduce the likelihood that the change is driven by some characteristic specific to one brand (such as a new product launch) and increase the likelihood that the change is representative of the industry landscape in general. 

In this case we can assume that, in the absence of a brand-protection programme, the infringement landscape for the customer brand would have increased by the same proportion as that seen for the benchmark brand(s). Therefore, instead of our ROI calculation being a function (' f ') of 'Y – X' (written as 'ROI = f [Y – X]'), we can say that:

ROI = f [ ( (B/A) × Y ) – X ]

Essentially, we are saying that, had the brand-protection programme not been in place, we might have expected the 'background' level of infringements for the customer brand also to have increased by a factor of ('B/A') by the end of the monitoring period, and so the benefit of the programme is in reducing it from this value to the value observed ('X').

Of course, the same approach can also be used if the benchmark shows a decrease in infringements across the monitoring period.

Discussion

The calculation of ROI for brand protection is fiendishly complicated, and no single approach will be applicable in all cases. In any selected methodology, it is necessary to make use of a wide range of assumptions and proxies for the data to which we would ideally like to have access. Nevertheless, there are some general industry-accepted standards for these calculations, many of which utilise metrics around ongoing levels of enforcement activity. In this article, we have considered some approaches which could be taken to modify these methodologies towards a new framework of ideas, involving the following two fundamental changes:

  • Considering the difference between the ongoing levels of enforcement (as a measure of the ongoing level of infringement activity), and those seen at the outset of the programme, as a measure of the overall impact of the brand-protection programme (rather than just considering the ongoing levels of enforcement in their own right).
  • Considering the use of one or (ideally) more benchmark brands, to separate out the observed change in infringement levels (for the customer brand) arising from the enforcement activity, from other background or landscape changes applicable to the industry vertical in general

Even then, there are still other factors to consider - the customer brand may also have experienced (company-specific) issues (such as product launches, changes in sales channels or target markets, etc. etc.) which themselves could have driven changes in the number of infringements, even in the absence of an enforcement programme or industry issues. All of this can further complicate the calculations to be carried out.

Additionally, I anticipate that the general philosophy behind ROI calculations may need to evolve further to reflect other issues more directly tied to cybersecurity, as the importance of this area becomes more widely appreciated. A former colleague of mine recently asked in a LinkedIn posting[10]:

""So what's the cost?" is a frequent question I hear. Rather than thinking about the budget required, brands need to consider the financial and reputational costs of repairing the damage when they are impacted."

The key point here is thinking about proactive rather than reactive measures. This issue is particularly relevant when it comes to domain security, where a range of products are available to allow corporations to secure their domains from external attack vectors which can be highly damaging (from both financial and reputational points of view)[11]. The matter is of even greater urgency in a landscape where we still see significant proportions of the world's top companies failing to adequately protect themselves[12]

The expected financial loss ('L') per year due to (say) cybersecurity issues (an 'attack') is given[13] by:

L = patt × Catt

where patt is the probability of an attack occurring during the year, and Catt is the financial cost (the 'damage') resulting from the attack. From this, we can say that, if the probability of an attack can be reduced (from pattwithout_security to pattwith_security) through the implementation of domain security measures, the saving ('S') to the organisation can be written as:

S = ( pattwithout_security – pattwith_security ) × Catt

Whilst easy to formulate, this can be much harder to quantify. However, a recent study showed that 88% of organisations were subject to some form of DNS attack in 2021, with each attack costing the enterprise an average of almost $1 million[14]. If, then, the risk of an attack can be (conservatively) reduced from (say) 10% to 1% though the introduction of security measures, this equates to an equivalent annual saving to the company of the order of $90k. If the cost of implementing the security measures is less than this value, the return on investment will be positive. If we factor in also the implications for access to - and cost of - cyberinsurance cover, the importance of domain security products and services becomes ever clearer.

Acknowledgements

Thanks must go to Angharad Baber, Mark Barrett and David Riley for their feedback and input into this article.

References

[1] https://www.worldtrademarkreview.com/anti-counterfeiting/return-investment-proving-protection-pays

[2] https://www.worldtrademarkreview.com/global-guide/anti-counterfeiting-and-online-brand-enforcement/2022/article/creating-cost-effective-domain-name-watching-programme

[3] https://www.cscdbs.com/blog/four-steps-to-an-effective-brand-protection-program/

[4] https://circleid.com/posts/20220726-calculating-the-return-on-investment-of-online-brand-protection-projects

[5] 'Digital Brand Protection: Investigating Brand Piracy and Intellectual Property Abuse' by Steven Ustel (2019). Chapter 17: 'Accounting and Accountability'

[6] 'Digital Brand Protection: Investigating Brand Piracy and Intellectual Property Abuse' by Steven Ustel (2019). Chapter 9: 'Pivots'

[7] https://www.cscdbs.com/blog/brand-abuse-and-ip-infringements/

[8] By 'whackamole' in this context, I am referring to a consistent state in which infringements are reactively taken down as quickly as they appear (rather than implying a random or disordered approach).

[9] https://www.linkedin.com/pulse/four-new-case-studies-domain-registration-activity-spikes-barnett/

[10] https://www.linkedin.com/posts/stuart-fuller-17a7411_what-cisos-can-do-about-brand-impersonation-activity-7027979839747846144-E6ak

[11] https://www.linkedin.com/pulse/holistic-brand-fraud-cyber-protection-using-domain-threat-barnett/

[12] https://www.cscdbs.com/en/resources-news/domain-security-report/ (2022)

[13] This follows from the fact that, mathematically, the expected value ('Ex') of a variable ('X') is given by:

Ex = Si ( p(Xi) × Xi ), where p(Xi) is the probability of X taking the ith value

[14] https://www.efficientip.com/wp-content/uploads/2022/05/IDC-EUR149048522-EfficientIP-infobrief_FINAL.pdf

This article was first published on 9 February 2023 at:

https://www.linkedin.com/pulse/calculation-return-investment-brand-protection-thoughts-david-barnett/

Phishing trends 2024 - and a look at some new data for domain threat quantification

Overview This year's annual phishing report by Internet technology consultants Interisle [1] has provided a number of key insights into...