Tuesday, 5 March 2024

Finding the fakes: another application of keyword-based filtering

Introduction

In previous Stobbs studies, we have outlined how relevance keywords can be used for filtering down large sets of candidate webpages into the subsets most likely to be relevant to a particular issue of interest. In general, the approach involves looking on each webpage for instances of the keywords in close proximity to the name of the brand of interest, and calculating a score for the page, based on the numbers of such pairs and the proximity of the terms to each other in each case. The pages with the highest score are then the most likely to be of interest[1]. This methodology is based on the technique used to calculate brand sentiment (which utilises sentiment keywords, but is otherwise essentially identical)[2].

The ability to filter in this way is an essential component of any brand-monitoring solution, allowing the most relevant pages to be quickly and efficiently identified, and reducing the requirement for expensive analyst resource to manually review large volumes of results.

In this study, we apply the approach to the identification of e-commerce websites likely to be associated with the sale of counterfeit or otherwise infringing goods, from a pool of larger product-related sites, based on identification terminology typically used by infringing sellers. This is of key importance for any brand-protection programme aiming to tackle the most egregious infringements, where counterfeits often present the greatest risk to revenue, and are typically the most readily enforceable.

The methodology utilises a list of 'high-risk' e-commerce keywords, which have previously been identified as being highly diagnostic of non-legitimate products when used in conjunction with a brand name. Examples include 'dupe', 'mirror-quality', 'A-grade', and so on. These terms are most usually applied to fashion and luxury products, but can be applicable to other areas.

In this study, we focus on general Internet content (i.e. 'standalone' e-commerce websites identified through search-engine queries), rather than on marketplaces specifically. This is partly because issues with general Internet content are, in general, the hardest to solve (because of the greater difficulty in finding relevant content, and the limitless range of ways in which product information can be presented on the page), but also because some marketplaces explicitly prohibit the posting of listings with branded product references (meaning many sellers will resort to use of brand misspellings and variations). Accordingly, an approach able to handle the analysis of general content can usually be applied to more well-defined channels (though with appropriate modifications, where necessary).

Findings

Proof-of-concept 1: luxury brands

In the first proof-of-concept, we consider the online sale of luxury-branded products, using Gucci and Chanel as representative brand examples. In each case, we generate a 'pool' of candidate pages for analysis by collecting the results returned by search engines (the first page of Google.com results, in this case) in response to query-terms consisting of the brand name combined with product-related keywords ('handbag', 'shoes', etc.) and e-commerce-related (and especially 'high-risk') keywords (such as 'buy', 'shop', 'cheap', 'discount', 'replica', etc.). For each brand, this yielded just under 2,000 unique webpages for analysis. Of the pages which were accessible for analysis by our automated tool, the datasets produced 252 results yielding non-zero (positive) scores for Gucci, and 222 for Chanel. The assertion is that these pages are the ones most likely to be offering the sale of counterfeit products, based on the use of 'high-risk' keywords near to the brand names (and, moreover, that the pages with the higher scores will, in general, feature greater numbers of such references and/or brand/keyword pairs with closer proximity). These would therefore be the priority candidates for further analysis and potential enforcement.

Manual inspection of the results suggests that this is indeed the case, and almost all of the non-zero-scored pages feature what we would consider to be content of potential concern (Appendix A; Figures 1 and 2).

Figure 1: Examples of 'high-risk' pages for Gucci

Figure 2: Examples of 'high-risk' pages for Chanel

It is also worth noting that it may be necessary to 'tune' the keyword lists in response to the types of content being found and the terminology used. An earlier iteration using 'inspired' as a 'high-risk' keyword (e.g. in the context of 'Gucci-inspired' items) was found to lead to a less 'clean' categorisation of results, due the more generic nature of this term and its tendency also to be used in other contexts (e.g. "It was not just the high quality clothes of the wealthy guests that inspired Gucci" and "During the 1930s Gucci became inspired by horse racing"[3]).

Proof-of-concept 2: electronics brands

The second case study considers electronics brands (using Samsung and Panasonic as representative examples, with appropriate product-related keywords - 'tv', 'smartphone', 'headphones', etc.), as an area where the sale of counterfeits is of particular concern because of the obvious safety implications.

In this case it was found that the keyword 'discount' (which was actually relatively diagnostic of 'high-risk' content for luxury brands) was less diagnostic for electronics, in part due to the large number of sites offering discount codes. This keyword was therefore removed from the list for this second analysis.

However, the approach was again effective, though a manual inspection of the positively-scored results found a smaller proportion of results of real potential concern than for the luxury brands, where the keywords are more directly applicable to infringing items. However, as in the first study, significant numbers of results of interest were successfully identified, and effectively prioritised using the keyword / scoring approach (Figures 3 and 4).

Figure 3: Examples of 'high-risk' pages for Samsung

Figure 4: Examples of 'high-risk' pages for Panasonic

Conclusion

Overall, the use of 'high-risk' e-commerce keywords in an analysis of proximity to brand terms has been found to be an effective way of prioritising sets of 'candidate' pages, in order to identify the subset which are most likely to be associated with the sale of counterfeits (even if the exact list may need to be 'tuned' to be most applicable to specific industry or product areas). The approach is therefore an important consideration in the deployment of automated brand-monitoring tools, to build efficiency into the analysis process and reduce the incidence of 'false positives'.

In the specific cases of the live infringements identified in this analysis, the fact that all (by definition) have been live for long enough to have been indexed by search engines, and are highly ranked in response to relevant query terms, should be a cause for concern for the respective brand owners. Assuming that these brands do have brand protection services in place, the analysis highlights that (depending on the methodologies used) in many cases, relevant results may be being missed, are not effectively being identified from within wider datasets of identified findings, or that enforcement attempts may have been ineffective (which may particularly be the case for some of the more problematic jurisdictions, such as with .ru and .cn domains) - or some combination of all of the above points. This conclusion highlights the requirement for an effective programme for detection, prioritisation, and takedown to be employed by brand owners.

Appendix A: The most highly scored pages for luxury brands

Table A.1: Webpage titles for all results assigned a potential relevance/risk score of 300 or greater for Gucci

Webpage title
                                                                                                              
Potential relevance score
                                            
  Replica Gucci Handbags Wallets Gucci replica purses ... 4652
  GUCCI REPLICA HANDBAG - SomaliNet Forums 4307
  Gucci replicas expert - buy the best quality fake Gucci 3966
  Gucci First Copy Shoes 3485
  A 1:1 quality Gucci replicas online sale store 2829
  Gucci Shoes Discount | ShopStyle UK 2515
  Discount Gucci Belt 2400
  Gucci Replica - RoyalPurse 2113
  75% OFF Gucci Discount Code: (3 ACTIVE) Jan 2024 1912
  Replica Gucci Womens Shoes Collection 1829
  Best Replica Gucci Accessories on Topbiz.md 1688
  Gucci Handbags Discount 1686
  Replica Gucci Trainers - Casual Sneakers 1534
  Sneakers for Men - Gucci Replica 1405
  Moccasins and Loafers for Men - Gucci Replica 1405
  Tech Accessories for Women - Gucci Replica 1405
  Men Accessories && Wallets - Gucci Replica 1405
  Fine Jewelry Archives - Gucci Replica 1405
  Silver Necklaces for Men Archives - Gucci Replica 1405
  Replica Gucci Ring 1396
  Replica Gucci Sneakers For Men 1395
  Elegant gucci replica For Stylish And Trendy Looks 1215
  replica Gucci jewelry 1214
  Bum Bag Gucci Dupe - Smart Accessories 1201
  Fake Men's and Women's Gucci Shoes 1023
  Gucci Replica Bamboo Bag Archives | Knock Off Designer ... 988
  Gucci Discount Code - 10 Vouchers in January 2024 915
  DesignerBagHUB cheap discount gucci replica jewelry ... 835
  Buy replica gucci with free shipping on AliExpress 800
  Finding The Perfect Replica Gucci Handbag 776
  Gucci Shoes Archives 721
  splurge and save: gucci princetown fur-lined leather slipper ... 692
  Guides To Spot A Good Replica Gucci Bag 665
  How Good Can Gucci Replicas Be? 664
  GG Marmont Bags - Gucci Replica Handbags 656
  Replica Gucci Shoes Wholesale Buying Guide 2020 649
  Stunning Replica Gucci Jewelry at DesignerBagHUB 604
  Vintage gucci replica (imitation) - Gem 598
  Gucci GG Black small messenger Bag 523599 582
  Gucci dupe shoes - thriftbiscuit 574
  The Aldo Gucci Bag Dupe MUST HAVE - 572
  Premium Gucci Formal Shoes On Full Cash On Delivery 539
  Replica Gucci GG Marmont Bags | Purse-Area 518
  Inspired Gucci Bags - The Same Look For Less - Pinterest 518
  17+ Best Gucci Inspired Bags that Look Designer 515
  Top Quality Gucci Replica Shoes 515
  Golden Gucci Copy Ladies Bracelet Watch 504
  THE GUCCI DUPE 500
  Gucci Dupe Belt in Black 500
  Gucci Replica Pendant for girls 403
  3d ring replica Gucci 3D print model 403
  gucci hats replica 400
  Best Deals for Knock Off Gucci Bags 362
  27 Gucci Replica Handbags ideas 361
  High Quality Replica Gucci Handbags For Sale | Purse-Area 350
  I thought I'd nabbed a bargain after buying Gucci bag dupe ... 350
  Gucci Shoe Alternatives - Linn Style 337
  first copy gucci bag 331
  Gucci Dupe Purses - Rio Clemens 325
  Gucci Bag Dupes - Linn Style 325
  9 Gucci Loafer Dupes (Including the Ones I Bought!) 325
  Handbags | Gucci Copy Hand Bag 🛍️ | Freeup 300
  Gucci 1955 Horsebit Bags - Replica Designer Handbags 300
  Slingbags | Gucci Replica Sling Bag - FreeUp 300
  Mirror copy Gucci Sneaker shoes with freebies. 300
  Replica Gucci handbag in Bristol 300

Table A.2: Webpage titles for all results assigned a potential relevance/risk score of 300 or greater for Chanel

Webpage title
                                                                                                              
Potential relevance score
                                            
  Best 25+ Deals for Copy Chanel Bags 4092
  Best 25+ Deals for Copies Of Chanel Bags 3631
  Chanel Double Flap Medium Replica Bag Review (Caviar ... 2055
  Top Chanel replicas - Affordable Luxury Inspired Handbags 2038
  How to pick Chanel replica jewelry for love 1956
  Replica Chanel Classic Flap Bag Full Review 1483
  Chanel Replica Jewelry 1450
  Best 25+ Deals for Copy Chanel Bags - Poshmark 1399
  Replica Chanel Sandals 1375
  Authentic vs. Replica Chanel Flap Bag: A Detailed Comparison 1177
  Wholesale and retail best chanel Jewelry Replica 1173
  Best Chanel Replica Bag – Full Review 1104
  Chanel Replica Necklace 1100
  Cotton High Quality Replica Chanel Beach Towel 1077
  Stunning Chanel Necklace Dupes - Replica bags 993
  replica chanel shoes 961
  Chic Chanel Earrings Dupes - Budget-Friendly Luxury 944
  Replica CHANEL 934
  Chanel Handbags Replica 834
  Replica CHANEL 834
  Chanel Sneakers Real vs high quality Replica. - MyBizShare 717
  Chanel Replica Handbag Reviews and Shopping 704
  Chanel Promo Codes: 10% Off January 2024 604
  Auckland fashionistas duped by replica Chanel handbags 600
  12 Ways to Spot a Fake Chanel 600
  Best Chanel replica store - Xpurse 593
  Replica Chanel Shoes 507
  Replica Chanel Bags, Cheap Chanel Bags 100% Satisfaction ... 474
  Look For Less: 9+ Best Chanel Inspired Earrings 449
  Blair Cap Toe Ballet Flat (Women) curated on LTK 430
  Vintage, Chanel Replica, Huge Faux Pearl Necklace 409
  Brand New Quality Replica Chanel Designer Womens ... 400
  Vintage, Chanel Replica, Huge Faux Pearl Necklace 400
  Best 25+ Deals for Chanel Lace Up Heels 375
  REPLICA Chanel Sandals 362
  8 Chanel Shoes ideas 336
  Replica Chanel Chair 331
  Replica Chanel Chair 331
  Amazing Chanel Necklace with pearl replica, crystal and ... 318
  Replica Chanel CC Logo Brooch 315
  Dune's lockstockk sandals are back in stock for 2022 312
  How to spot a Fake Chanel Bag 301
  chanel replica handbag for sale 300
  Designer Chanel Shoe Dupes - Stylish & Affordable 300
  Chanel Replica Mirror 300

References

[1] https://www.iamstobbs.com/utilisation-of-relevance-keywords-ebook

[2] https://www.iamstobbs.com/online-brand-prominence-and-sentiment-ebook

[3] https://thevintagecompactshop.com/blogs/antique-and-collectible-history/gucci-a-brief-history

This article was first published on 5 March 2024 at:

https://www.iamstobbs.com/opinion/finding-the-fakes-another-application-of-keyword-based-filtering

No comments:

Post a Comment

Br'AI've New World - Part 1: Brand protection 'clustering' as a candidate task for the application of AI capabilities

Introduction The issue of 'clustering' in brand protection - that is, the ability to flexibly identify the existence of links betwee...