Tuesday, 6 February 2024

Utilisation of relevance keywords for prioritising results in brand monitoring

BLOG POST

The use of keyword-based matching to identify the most significant findings within a larger set of candidate webpages is a key element of many brand-protection technologies. The approach can build efficiencies into the overall analysis process, and can help to identify priority targets for content tracking or enforcement, and is therefore an essential component of effective tools used for brand-protection programmes. 

In our latest study, we outline a new methodology for analysing the proximity of 'relevance keywords' to brand terms, in order to calculate a metric for the potential level of relevance of a webpage. The framework is based on the methodology previously outlined for calculating the sense of the sentiment of brand references in our 'top 100 brands' study[1], and is a flexible approach which can be tailored to a range of different contexts. The methodology is illustrated using a series of short case studies.

The concept of filtering using relevance keywords is central to many areas of brand monitoring, all broadly covered under the description of 'issue monitoring'. These ideas set the scene for a number of additional applications in the brand-protection arena, such as the use of identification of instances of 'high-risk' e-commerce keywords to prioritise websites based on the likelihood of their association with the sale of counterfeit goods.

Reference

[1] https://www.iamstobbs.com/online-brand-prominence-and-sentiment-ebook

This article was first published on 6 February 2024 at:

https://www.iamstobbs.com/opinion/utilisation-of-relevance-keywords-for-prioritising-results-in-brand-monitoring

* * * * *

WHITE PAPER

Introduction

One of the primary requirements of brand-monitoring technologies is the ability to prioritise the identified results (i.e. detected webpages) by the likelihood that their content will be of relevance to the categories of material of interest. This prioritisation is of great importance for projects which may potentially involve the collection of very large numbers of candidate pages, to ensure that key findings can be identified in a timely manner, particularly when analyst time may be a high-cost resource. Prioritisation is also relevant in the identification of priority targets for enforcement and content tracking.

In this study, we investigate the use of 'relevance keywords' in the identification of specific pages relevant to a particular area of content, within a larger 'pool' of more general webpages. The methodology is an adaptation of that outlined previously for sentiment analysis of the top 100 global brands[1], but where relevance keywords are here used in place of sentiment keywords. Note that the approach is distinct from that outlined for cases where a particular brand name can occur in altogether relevant or non-relevant contexts (e.g. where the name of the Google 'Gemini' brand could occur in relation to AI (artificial intelligence) or in relation to astrology), where we suggest the use of a keyword-based overall content scoring approach to determine the subject area of the page as a whole[2].

The methodology is also different from the concept of content scoring for just the brand name (as is used in measurement of brand prominence), which - unless additional filtering is applied - will not distinguish between relevant and non-relevant / third-party references. In the relevance-keyword methodology outlined here, we consider only the appearance of the brand name in close proximity to instances of relevant keywords. In so doing, we can calculate a potential relevance score for the brand on each page, analogous to the sentiment score described in the 'top 100 brands' study.

The methodology is broadly applicable to an area of brand protection referred to as 'issue monitoring' - i.e. where content relating to a brand is of interest only if it relates to a particular subject area. This might pertain to a specific sub-brand or product, a news story, or an association with a particular individual, other company, or category of content; these references can accordingly be identified by searching for references to the brand near to keywords relating to the issue in question. Although this 'targeting' approach can be addressed to some degree through the incorporation of the relevance keywords into the search queries used to return the set of candidate pages for analysis, it will not always result in a 'clean' set of results for a number of reasons, not least the way in which the search-source handles multi-word search terms (e.g. whether it requires the results to feature one or both terms, whether it suggests alternatives, and the usual inability of search engines to return results only where the terms appear in close proximity to each other). These points are explored below through a number of case studies[3].

Case studies

Case study 1: News relating to the WH Smith rebrand

This first case study relates to the January 2024 news story about the re-branding of a number of WH Smith stores with a new logo appearing similar to that of the NHS[4]. Superficially, we might expect to be able toidentify references to this news story by searching for 'whs nhs' (which, on Google, adds an implicit Boolean 'AND' - i.e. requires the results returned to feature both 'whs' and 'nhs', but with no explicit condition on the proximity on the page of the two terms). However, there are a number of reasons why this is not effective. Firstly, Google suggests 'wsh' as an alternative for 'whs', and presents results for both terms together (which, in an automated monitoring tool, would just be added to the same dataset for processing) (Figure 1)

Figure 1: A search suggestion presented by google.com

Secondly, both search terms ('whs' and 'nhs') are sufficiently generic that they can occur in unrelated contexts. Our assertion is that a page would be more likely to relate to the WHS / NHS news story if both terms appear close to each other on the page. In order to test this, we treat 'WHS' (together with checking for variants such as 'WH Smith', 'W H Smiths', etc.) as the brand name, and calculate a relevance score based on each appearance of the brand name near to the relevance keyword(s) (just 'NHS' in this case), with greater numbers of closer mentions generating higher scores.

In this case, actually only one of the pages returned by Google in the first page of results actually relates to the news story in question, and this result is correctly picked up as the highest scored result within the dataset (Table 1).

Table 1: All non-zero relevance score webpages from the first page of Google results for 'whs nhs'

Case study 2: Announcement of Havaianas' new CEO

In December 2023, Alpargatas - the owner of footwear brand Havaianas - announced Mondelēz executive Liel Miranda as the new CEO, to take effect from February 2024[5,6]. In monitoring for references to this story, we may wish to search for (say) 'havaianas ceo'. In practice, this returns a range of results, including references to previous CEOs and other news stories, and references to the CEOs of other companies in conjunction to references to Havaianas[7].

However, if we apply the same keyword-based filtering approach to the set of pages returned - namely, classifying on the basis of relevance score for mentions of Havaianas (or Alpargatas) near to the relevance keywords 'miranda' and 'mondelez', we again find a relatively clean separation of the relevant results (Table 2) from the remainder.

Table 2: All non-zero relevance score webpages from the first page of Google results for 'havaianas ceo'

Case study 3: References to the Facebook 'news tag'

For the next case study, we suppose it was required to search for explicit references to the Facebook news tag (for example, as referenced in the news story that Meta was planning to deprecate the feature[8]). A search for 'facebook news tag' is actually not a particularly efficient way of collecting relevant pages, because of the genericness of the three individual terms in conjunction with each other (i.e. many references to Facebook occur in conjunction with mentions of news, newsfeeds, etc., and tags - e.g. photo tagging, etc.). Whilst this can be mediated to some extent through the use of exact-phrase searching ('facebook "news tag"' or even "facebook news tag" explicitly), this will not capture content where the phrases are not used in this exact format (e.g. an exact phrase search for "facebook news tag" will not return content where the terms are referenced differently, such as "news tag on facebook"). Instead, it is possible to filter the pages by analysing for references to Facebook in conjunction with the relevance keywords 'news' and 'tag', which will have the added benefit of further upweighting pages on which both terms appear near the brand name (i.e. multi-term matching) and which are therefore likely to be the most relevant (Table 3). This approach also provides a much better prioritisation than using just one keyword (say, 'news') for which the scores are also shown in Table 3 (where several relevant results score zero and would have been missed).

* This is the link to the news story referenced above

Table 3: All webpages from the first page of Google results for 'facebook news tag' with relevance scores of 100 or greater

Case study 4: Searches for Gucci handbags

As an illustration of how the same approach could be applied to a brand / product combination, we consider the case of searching for Gucci handbags (using 'gucci handbag' as our initial search term). Whilst the vast majority of the pages returned by a query of this type will be generally relevant, there may be a range of content types, including e-commerce sites, informational sites, and a mixture of sites dedicated specifically to Gucci handbags versus those also featuring content relating to other brands or products. However, use of the relevance-keyword approach (looking for references to Gucci near 'handbag(s)' or 'bag(s)' provides a basis for prioritising the results according to the degree to which the pages relate to Gucci handbags specifically (comparable to the content scoring approach for a single brand name or term) (Table 4).

Table 4: All webpages from the first page of Google results for 'gucci handbag' with relevance scores of 1000 or greater

As a further step, it would be possible to (for example) incorporate analysis of e-commerce-related keywords (on either a proximity or a content-scoring basis) to further rank the results on the basis of their likelihood to be offering the sale of relevant products. This type of analysis is central to the 'discovery' process for e-commerce sites (i.e. identifying sites which were not known at the outset of monitoring).

Conclusion

The approach outlined in this study allows us to prioritise results gathered from search engines, according to the proximity of the name of the brand under consideration to any of a list of keywords pertaining to the content area(s) of interest. These ideas can be incorporated into automated monitoring tools to provide a means of building efficiency into the analysis process, ease of identification of the highest-relevance results, and a reduction in false positives. The methodology sits alongside other related ideas, such as the use of content scoring and prominence and sentiment analysis. As with these other areas, the approach is flexible and can be tailored to specific requirements (e.g. by varying the keyword configurations and the proximity range over which the matching is carried out (i.e. the 'half life' of the decaying proximity function)).

These frameworks can be applied to a range of content types, including search results drawn from different search engines and platforms, and are of particular use in cases where multi-word search terms may not be handled by the platform in question in the expected way, resulting in a need for these types of post-processing to focus in on the desired findings.

References

[1] https://www.iamstobbs.com/online-brand-prominence-and-sentiment-ebook

[2] https://www.iamstobbs.com/google-gemini-ebook

[3] Findings based on results returned and analysis of content carried out on 03-Jan-2024

[4] https://www.theguardian.com/business/2023/dec/27/wh-smith-whs-rebrand-criticised-for-similarity-to-nhs-logo

[5] https://fashionunited.uk/news/people/havaianas-owner-alpargatas-appoints-liel-miranda-as-new-ceo/2023121373112

[6] https://www.drapersonline.com/news/havaianas-owner-announces-new-ceo

[7] e.g. https://www.opticaljournal.com/alpargatas-and-safilo-renew-havaianas-eyewear-licensing/ - "We are very proud of this early renewal, which aims to strengthen a project initiated in 2016," commented Angelo Trocchia, CEO of Safilo Group. "We want to grow the havaianas eyewear business through collections that reflect the unique personality and creative simplicity of this important Brazilian brand, which is receiving an exceptional reception, particularly in Southern Europe."

[8] https://www.wired.co.uk/article/facebook-is-giving-up-on-news-again

This article was first published as an e-book on 6 February 2024 at:

https://www.iamstobbs.com/utilisation-of-relevance-keywords-ebook

No comments:

Post a Comment

Unregistered Gems Part 6: Phonemizing strings to find brandable domains

Introduction The UnregisteredGems.com series of articles explores a range of techniques to filter and search through the universe of unregis...