Thursday, 17 October 2024

Further developing a word mark similarity measurement framework - Part II: Defining an improved similarity score

Introduction

My initial study on mark similarity measurement[1] focused on formulations for quantifying the objective similarity of pairs of marks, with particular focuses on colour- and word marks. As discussed in previous articles in this series, mark similarity assessment is a key part of the resolution of many intellectual property disputes, and a more objective approach could have a number of advantages, including the potential to provide definitions which could be built into case law, offer greater consistency across dispute decisions, and specify thresholds for IP protection.

However, it is important to reiterate the key point that any objective algorithms of these types should only ever be considered as tools to be used as part of the overall assessment process, which overall includes significant degrees of subjectivity. In the first instance, the algorithmic frameworks presented in this series for word marks focus only on visual (spelling) and aural (pronunciation - with a specific basis in American English) similarity, with no account taken of conceptual similarity (i.e. meaning) or the influence of any associated logos, imagery or mark stylisation. Overall, dispute decisions are often reliant on an assessment of the likelihood of confusion between the marks in question, which is generally also dependent on a range of other factors, including the distinctiveness, degree of overlap of associated goods and services, strength and degree of renown of the marks, documented evidence of actual confusion, and the degree of attention paid by a typical consumer - many of which may vary between different geographical regions[2,3]. Some of the factors generally considered for the components which can be measured algorithmically (such as typically putting greater weight on comparisons between elements appearing at the start of the marks in question, and greater emphasis on differences appearing within shorter marks[4]) can, and have, been built into the proposed algorithms wherever possible. 

The degree of similarity (of each type) between marks is often specified in dispute cases as 'high', 'medium' or 'low'; with this in mind, it seems reasonable (where constructing any measurement algorithm) to formulate the output as a similarity score (as proposed for colour marks in the previous article[5] in this series), which aligns broadly with this framework but offers a more quantitative basis for comparison (though keeping in mind that all of the above caveats also still apply!).

Formulation of the similarity score algorithm

The similarity score used for comparison of pairs of word marks (Swor), in both the previous study and this follow up, reflects both visual (spelling) and aural (pronunciation) similarity (only). 

As in the initial version, visual similarity between the marks (i.e. in terms of their spelling) is quantified using two distinct algorithms, each of which reflects different aspects of the similarity. The two algorithms (each of which generates a score which can be expressed as a percentage) are:

  • The fuzz.ratio metric (FLev), an algorithm implemented in the Python package 'fuzzywuzzy'[6], based on the concept of Levenshtein distance - a way of quantifying the number of edits required to transform one string into the other - but also taking account of other factors (including the length of the strings).
  • The Jaro-Winkler similarity algorithm (and score (simj)) (as implemented in the the Python package 'Levenshtein'[7]), which includes an element of consideration of the proximity of the matching / non-matching characters to the start of the strings. 

In the simplest formulation of the overall algorithm (and as retained here), the score component reflecting overall visual similarity (Svis) is expressed just as the simple mean of the above two scores (as below), although it would be possible to apply different weightings if required.

Svis = (FLev + simj) / 2

For aural similarity, the proposed calculation framework is based on the creation of a phonetic representation of the marks / strings in question, and then a comparison of these representations (again, using the fuzz.ratio metric). 

The initial formulation also made use of two distinct algorithms for generating the phonetic representations, based on the Soundex and NYSIIS (New York State Identification and Intelligence System) encodings. However, both of these have certain shortcomings, not least the poor handling of vowel sounds within the strings, and (in Soundex) the inability to encode any consonants beyond the first four.

In this improved version, therefore, I instead propose the use of the Phonemizer algorithm[8,9] for generating the phonetic versions of the strings, which utilises IPA (International Phonetic Alphabet)[10] encoding, and which was explored in the previous follow-up study[11] and appears to perform well (although some data 'cleansing' is required in some cases, to ensure that the algorithm interprets the string as intended). The aural similarity score (Saur) can then be calculated simply as the output of the fuzz.ratio metric applied to the IPA representations as given by Phonemizer, i.e.:

Saur = FPho

As in the previous formulation, the overall (word mark) similarity score can then most simply be expressed just as the mean of the two individual components, i.e.:

Swor = (Svis + Saur) / 2

Similarity scores for test-pairs of marks

As an illustration of the performance of this algorithm, I consider a set of approximately 200 pairs of word marks, mostly the subjects of recent trademark disputes (several of which were also considered in previous articles in this series), and with a primary focus on single-word marks (for simplicity). The full set of mark-pairs, and the calculated similarity scores, are presented in Appendix A.

The first point to note is that, generally, little pre-processing of the data is required in order to utilise the algorithm. All marks have been converted to lower-case, though this is generally a matter of choice, just to ensure that upper- and lower-case versions of the same letter are treated identically. The algorithms do also appear to correctly handle accented characters (albeit that the phonetic representations will generally reflect an English pronunciation). The only two modifications to the data required in these cases were a rewriting of 'OrangeryOS' as 'orangery-o-s' (to ensure that the pronunciation is rendered as 'oh-es') and (as in a previous study) of 'likeme' to 'like-me'. 

Elsewhere (as noted previously), the Phonemizer algorithm renders 'unreadable' strings as individual characters (e.g. 'immun44' as 'immun-four-four', '007' as 'zero-zero-seven', 'ch_t.' as 'see-aitch-tee', and 'mbfw' as 'em-bee-ef-doubleyu'), though these versions have been retained in an unmodified state in the analysis. Some of these representations may not be as originally intended when the marks were conceived, however - e.g. 'genv3rse' is rendered as 'genv-three-rse' (rather than the more likely 'genverse'), and 'm4tter' as 'em-four-tter' (rather than 'matter').

Overall, however, the algorithm does seem to provide a (subjectively) reasonable ranking of the mark-pairs by similarity. An attractive additional characteristic of this framework is that it is entirely repeatable, and unreliant on the number and types of pairs in the dataset (i.e. a particular word-pair will always give the same score), so it is always possible to compare like-with-like. Accordingly, it is instructive to consider some representative examples of word-pairs giving particular (approximate) scores (Swor), to provide a 'reckoner' of what the scores represent, i.e.:

  • Approx. 90%:
    • boss / bossi
    • billionaire / zillionaire
    • thermacare / thermocare
    • prinker / prink
    • intellicare / intelecare
    • chooey / chooee
    • mahendra / mahindra
  • Approx. 80%:
    • zara / zarzar
    • rabe / rase
    • retaron / retlron
    • createme / create.
    • spa / spato
    • thermomix / termomatrix
  • Approx. 70%:
    • kelio / kleeo
    • terry / terrissa
    • tygrys / tigris
    • nike / nuke
  • Approx. 60%:
    • nutella / mixitella
    • airbnb / francebnb
    • gallo / rampingallo
    • iphone / mifon
    • joy / bjoie
    • jd / jdyaoying
  • Approx. 50%:
    • zara / zorazone
    • quirón / quiromasté
  • Approx. 40%:
    • book / restaubook
    • h10 / motel 10

An additional attractive aspect of this approach is that it is also possible, if required, to consider the visual and aural similarity components separately. For example, the top pairs of marks by visual similarity score (Svis) (only) are fashiongo / fashionego (96.50%), configon / configo (95.25%) and casoria / castoria (95.04%), and by aural similarity score (Saur) (only) are sanytol / sanitol, testex / test-x, hobbit / hobbyt , kramer / cramer, kresco / cresco, and cylance / sylence (all 100%, i.e. deemed phonetically identical).

Discussion

Overall, (and again as noted previously) it would not be reasonable to expect any significant correlation between the similarity scores and the findings reached in the associated disputes, because of the significant additional (and subjective) points considered in the analysis, as discussed in the introduction to this article. For example, in the Initio / Vinicio case, the marks were found to have 'below average' visual similarity (despite the quantitative objective visual similarity score of 80.96%), with consideration having been given in the case to the differing impact of the various elements and the overall impression of the respective marks, which feature significant differences in visual presentation[12]

Nevertheless, the similarity score does offer a useful tool to consider the 'pure' visual and aural similarity (only) of the word marks, as part of an overall analysis (for example, in dispute cases), in a framework which is repeatable and qualitative, providing the potential for a consistent approach to assessment of these characteristics. It also aligns with the familiar terminological descriptions of 'degrees' of similarity, whilst offering a more granular and continuous scale. 

The algorithm does also offer additional possible use-cases, such as (for example) the ability to post-process the outputs from trademark watching services, so as to better sort the results by relevance (in cases where the sorting algorithm offered by the service performs less satisfactorily), and thereby aid in the review process.

It is also worth noting that there is also scope for possible future enhancements to the algorithms (some of which have been discussed previously), including (for example) assessments of the distinctiveness of the various elements or sub-elements (subsequences or substrings) of the marks, re-weighting the contribution of any trailing ‘s’, and so on. Distinctiveness and analysis of the 'types' of elements present in the marks may, in particular, be key to making a more meaningful overall assessment of similarity and, ultimately, likelihood of confusion. Relevant examples for consideration in the dataset include Cylance / Sylence (both 'clearly' allusions to the same common word ('silence')), Doctolib / Avocatlib (where the first portion of each mark makes reference to a profession), BMW / BMV (where the only difference is manifested as a pair of 'similar' letters), Immun44 / Immuno-19 (both featuring a similar root and, unusually, followed specifically by a number), iPhone / Mifon (with the similarity between 'I' and 'me' being of potential relevance), and Align / Clickalign (relevant because of the range of additional names cited by the latter party, suggesting the key point is the question of the distinctiveness of the term 'align' for the relevant goods and services).

Appendix A: Pairs of marks and their visual, aural and overall similarity scores

Mark 1
                                
Mark 2
                                
Vis. sim. score
(
Svis)
                                
Mark 1 (IPA)
                                
Mark 2 (IPA)
                                
Aur. sim. score
(
Saur)
                                
Overall word mark sim. score
(
Swor)
  casoria   castoria 95.04   kæsoːɹiə   kæstoːɹiə 95.00 95.02
  sanytol   sanitol 89.67   sænɪtɑːl   sænɪtɑːl 100.00 94.83
  testex   test-x 88.17   tɛstɛks   tɛstɛks 100.00 94.08
  hobbit   hobbyt 88.17   hɑːbɪt   hɑːbɪt 100.00 94.08
  replay   re:play 94.10   ɹiːpleɪ   ɹiː pleɪ 94.00 94.05
  kramer   cramer 85.94   kɹeɪmɚ   kɹeɪmɚ 100.00 92.97
  kresco   cresco 85.94   kɹɛskoʊ   kɹɛskoʊ 100.00 92.97
  cintra   citra 93.28   sɪntɹə   sɪtɹə 92.00 92.64
  dekton   deton 93.28   dɛktən   dɛtən 92.00 92.64
  free   freen 92.50   fɹiː   fɹiːn 91.00 91.75
  goddess   godless 89.67   ɡɑːdəs   ɡɑːdləs 93.00 91.33
  boss   bossi 92.50   bɔs   bɔsi 89.00 90.75
  billionaire   zillionaire 92.47   bɪliənɛɹ   zɪliənɛɹ 89.00 90.73
  thermacare   thermocare 91.89   θɜːmɐkɛɹ   θɜːməkɛɹ 89.00 90.44
  prinker   prink 88.64   pɹɪŋkɚ   pɹɪŋk 92.00 90.32
  intellicare   intelecare 90.18   ɪntɛlɪkɛɹ   ɪntɛlᵻkɛɹ 90.00 90.09
  chooey   chooee 88.17   tʃuːi   tʃuːiː 92.00 90.08
  dcsl   dcs 90.08   diːsiːɛsɛl   diːsiːɛs 90.00 90.04
  mahendra   mahindra 91.08   mæhɛndɹə   mæhɪndɹə 89.00 90.04
  lucite   luci 86.67   luːsaɪt   luːsaɪ 93.00 89.83
  george   georgine 90.50   dʒɔːɹdʒ   dʒɔːɹdʒɪn 89.00 89.75
  tropico   tropicazo 91.78   tɹɑːpɪkoʊ   tɹɑːpɪkɑːzoʊ 87.00 89.39
  demiegod   demigods 91.50   dɛmɪeɪɡɑːd   dɛmɪɡɑːdz 86.00 88.75
  mbet   m-bets 85.00   ɛmbɛt   ɛmbɛts 92.00 88.50
  fashiongo   fashionego 96.50   fæʃəŋɡoʊ   fæʃəniːɡoʊ 80.00 88.25
  cylance   sylence 75.98   saɪləns   saɪləns 100.00 87.99
  ping   pingke 86.67   pɪŋ   pɪŋk 89.00 87.83
  pikdare   pi-kare 89.19   pɪkdɛɹ   paɪkɛɹ 86.00 87.60
  mbfw   mvfw 80.00   ɛmbiːɛfdʌbəljuː   ɛmviːɛfdʌbəljuː 94.00 87.00
  joy   joyme 82.83   dʒɔɪ   dʒɔɪm 91.00 86.92
  configon   configo 95.25   kənfɪɡən   kənfɪɡoʊ 78.00 86.62
  prinz   prinse 81.17   pɹɪnts   pɹɪns 92.00 86.58
  lovello   lovelle 90.14   lʌvloʊ   lʌvl 83.00 86.57
  energeo   enerjo 83.98   ɛnɚdʒeɪoʊ   ɛnɚdʒoʊ 89.00 86.49
  trucool   turcool 90.86   tɹuːkuːl   tɜːkuːl 82.00 86.43
  carbon   mycarbon 88.83   kɑːɹbən   maɪkɑːɹbən 84.00 86.42
  consiglieri   consigliera 93.68   kənsɪɡlɪɹi   kənsɪɡliɛɹə 78.00 85.84
  starbucks   charbucks 81.59   stɑːɹbʌks   tʃɑːɹbʌks 90.00 85.80
  realme   realmz 88.17   ɹɛlmi   ɹɛlmz 83.00 85.58
  axis   traxis 84.44   æksɪs   tɹæksɪs 86.00 85.22
  youtube   u-tubes 75.98   juːtuːb   juːtuːbz 94.00 84.99
  bimbo   gimbo 83.33   bɪmboʊ   ɡɪmboʊ 86.00 84.67
  tiktok   tiktaktok 85.00   tɪktɑːk   tɪktɐktɑːk 84.00 84.50
  z-biome   biome 86.74   ziːbaɪoʊm   baɪoʊm 82.00 84.37
  bacchus   cacchus 85.46   bækəs   kækəs 83.00 84.23
  philips   philzops 86.07   fɪlɪps   fɪlzəps 80.00 83.04
  patter   yatter 85.94   pæɾɚ   jæɾɚ 80.00 82.97
  noughty   naughtea 73.59   nɔːɾi   nɔːɾiə 92.00 82.79
  yorxs   yorks 85.33   joːɹksz   jɔːɹks 80.00 82.67
  jarlsberg   jørnsberg 82.33   dʒɑːɹlsbɜːɡ   dʒoːɹnsbɜːɡ 83.00 82.67
  globe-trotter   globetrotter xc 90.23   ɡloʊbtɹɑːɾɚ   ɡloʊbtɹɑːɾɚɹ ɛkssiː 75.00 82.62
  treca   trea 92.17   tɹɛkə   tɹiə 73.00 82.58
  resolution   resolute 84.75   ɹɛzəluːʃən   ɹɛzəluːt 80.00 82.38
  olympéa   olympe 83.98   əlɪmpeɪə   əlɪmp 80.00 81.99
  ellesse   elliss 83.22   ɛlɛs   ɛlɪs 80.00 81.61
  hugo   hug-o 92.17   hjuːɡoʊ   hʌɡoʊ 71.00 81.58
  initio   vinicio 80.96   ɪnɪɾɪoʊ   vɪnɪsɪoʊ 82.00 81.48
  bimbo   bimbolea 84.75   bɪmboʊ   baɪmboʊliə 78.00 81.38
  burgerme   burgerly 82.50   bɜːɡɚm   bɜːɡɚli 80.00 81.25
  1link   link 91.17   wʌn lɪŋk   lɪŋk 71.00 81.08
  repevax   epvax 86.74   ɹᵻpɛvæks   ɛpvæks 75.00 80.87
  free   freepour 78.50   fɹiː   fɹiːpɚ 83.00 80.75
  zara   zarzar 86.11   zɑːɹɹə   zɑːɹzɑːɹ 75.00 80.56
  rabe   rase 80.83   ɹeɪb   ɹeɪz 80.00 80.42
  retaron   retlron 89.67   ɹᵻtæɹən   ɹᵻtlɹɑːn 71.00 80.33
  createme   create. 86.07   kɹiːeɪɾiːm   kɹiːeɪt 74.00 80.04
  spa   spato 82.83   spɑː   spɑːɾoʊ 77.00 79.92
  thermomix   termomatrix 84.24   θɜːməmɪks   tɜːməmeɪtɹɪks 75.00 79.62
  atma   atmaspa 82.21   ætmə   ætmæspə 77.00 79.61
  live   vive 79.17   laɪv   vaɪv 80.00 79.58
  cana   canya 92.17   kɑːnə   kænjə 67.00 79.58
  l'oreal   joreal 80.96   ɛloːɹiəl   dʒoːɹiəl 78.00 79.48
  seiko   seycos 65.50   seɪkoʊ   seɪkoʊz 93.00 79.25
  pockit   mypocket 76.47   pɑːkɪt   maɪpɑːkɪt 82.00 79.24
  bisleri   bilseri 91.10   baɪslɜːɹi   bɪlsɚɹi 67.00 79.05
  kikkoman   kikomand 91.08   kɪkɑːmən   kɪkəmænd 67.00 79.04
  fido   fiio 80.83   faɪdoʊ   fɪɪoʊ 77.00 78.92
  waken   wakeful 77.21   weɪkən   weɪkfəl 80.00 78.61
  nutravita   nootrovita 79.17   nʌtɹɐviːɾə   nuːtɹəviːɾə 78.00 78.58
  um bongo   ubongo! 84.11   ʌm bɑːŋɡoʊ   juːbɑːŋɡoʊ 73.00 78.55
  pyra   prya 83.75   pɪɹə   pɹaɪə 73.00 78.38
  ulma   luma 83.33   ʌlmə   luːmə 73.00 78.17
  fransa   fanza 78.50   fɹænsə   fænzə 77.00 77.75
  chef   chefchy 82.21   ʃɛf   ʃɛftʃi 73.00 77.61
  boss   bossvel 82.21   bɔs   bɔsvəl 73.00 77.61
  hanson   hansol 88.17   hænsən   hænsɑːl 67.00 77.58
  lucozade   glucos-aid 72.67   luːkəzeɪd   ɡluːkoʊzeɪd 82.00 77.33
  asos   asas 80.83   ɐsoʊz   ɐsæz 73.00 76.92
  iqos   niccos 67.50   aɪkoʊz   nɪkoʊz 86.00 76.75
  zemo   zoomo 67.11   ziːmoʊ   zuːmoʊ 86.00 76.56
  hyprr   hypernft 72.83   haɪpɚ   haɪpɚnft 80.00 76.42
  free   freeyoung 75.44   fɹiː   fɹiːjʌŋ 77.00 76.22
  bimbo   bimbys 81.17   bɪmboʊ   bɪmbiz 71.00 76.08
  uber   youber 84.44   juːbɚ   jaʊbɚ 67.00 75.72
  dune   dne 89.25   duːn   diːɛniː 62.00 75.62
  scaffeze   scaffx 80.08   skæfɛz   skæfks 71.00 75.54
  foltene   foltex 83.98   foʊltiːn   foʊltɛks 67.00 75.49
  abanca   abaca 93.56   ɐbæŋkə   æbɑːkə 57.00 75.28
  ch   ch_t. 70.50   siːeɪtʃ   siːeɪtʃ tiː 80.00 75.25
  suntech   suntank 69.93   sʌntɛk   sʌntæŋk 80.00 74.96
  hotpatch   patch 78.92   hɑːtpætʃ   pætʃ 71.00 74.96
  huracán   huracanrace 77.53   hjʊɹɹɐkɑːn   hjʊɹɹɐkænɹeɪs 72.00 74.76
  free   freetalk 78.50   fɹiː   fɹiːɾɔːk 71.00 74.75
  free   freeloop 78.50   fɹiː   fɹiːluːp 71.00 74.75
  intelect   entelec 77.90   ɪntɛlᵻkt   ɛntɛlɛk 71.00 74.45
  maplab   maplab.world 78.50   mæplæb   mæplæb wɜːld 70.00 74.25
  sacher   sachi 81.17   sæʃɚ   sætʃaɪ 67.00 74.08
  fanta   fantarifa 81.06   fæntə   fæntɑːɹɹɪfə 67.00 74.03
  fiorelli   fioretto 73.50   fɪoːɹɛli   fɪoːɹɛɾoʊ 74.00 73.75
  sherco   charco 72.39   ʃɜːkoʊ   tʃɑːɹkoʊ 75.00 73.69
  vidas   vidya 85.33   viːdəz   vɪdɪə 62.00 73.67
  gobox   g-box 84.00   ɡoʊbɑːks   dʒiːbɑːks 63.00 73.50
  idee   idee-home 75.44   ɪdiː   ɪdiːhoʊm 71.00 73.22
  starbucks   sardarbuksh 76.21   stɑːɹbʌks   sɑːɹdɑːɹbʌkʃ 70.00 73.11
  orange   orangery-o-s 78.50   ɔɹɪndʒ   ɔɹɪndʒɚɹioʊɛs 67.00 72.75
  free   freeyond 78.50   fɹiː   fɹiːjɑːnd 67.00 72.75
  free   freepods 78.50   fɹiː   fɹiːpɑːdz 67.00 72.75
  sanytol   savisol 67.07   sænɪtɑːl   sævɪsɑːl 78.00 72.54
  snuggledown   snugglemore 81.05   snʌɡəldaʊn   snʌɡəlmoːɹ 64.00 72.52
  pez   pezeeu 77.67   pɛz   pɛziːuː 67.00 72.33
  zirco   cozirc 77.61   zɜːkoʊ   kɑːzɜːk 67.00 72.31
  glenfiddich   inverfiddich 74.10   ɡlɛnfɪdɪtʃ   ɪnvɜːfɪdɪtʃ 70.00 72.05
  salio   saliogen 84.75   sælɪoʊ   sælɪədʒən 59.00 71.88
  vallformosa   fermosa 70.77   vælfoːɹmoʊsə   fɜːmoʊsə 73.00 71.88
  noughty   nouti 76.17   nɔːɾi   naʊɾi 67.00 71.58
  tesla   teslapimp 81.06   tɛslə   tɛslɐpɪmp 62.00 71.53
  live   life's 70.00   laɪv   laɪfz 73.00 71.50
  e-bulli   bullit 80.96   iːbʊli   bʊlɪt 62.00 71.48
  bimbo   bims 75.92   bɪmboʊ   bɪmz 67.00 71.46
  genie   genai 85.33   dʒiːni   dʒɛnaɪ 57.00 71.17
  lakme   like-me 70.32   lækmi   laɪkmiː 71.00 70.66
  kelio   kleeo 70.25   kɛlɪoʊ   kliːoʊ 71.00 70.62
  terry   terrissa 74.00   tɛɹi   tɛɹɪsə 67.00 70.50
  tygrys   tigris 73.50   tɪɡɹiz   taɪɡɹɪs 67.00 70.25
  nike   nuke 80.00   naɪk   nuːk 60.00 70.00
  007   skx007 58.50   ziəɹoʊziəɹoʊ sɛvən   ɛskeɪɛks ziəɹoʊziəɹoʊ sɛvən 81.00 69.75
  geneverse   genv3rse 85.28   dʒɛnɪvɜːs   dʒɛnv θɹiː ɑːɹɹɛsiː 53.00 69.14
  lego   solego 76.11   lɛɡoʊ   sɑːliːɡoʊ 62.00 69.06
  perry   perryhome 81.06   pɛɹi   pɛɹɪhoʊm 57.00 69.03
  kadawe   kademae 80.89   kædɔː   keɪdmiː 57.00 68.94
  acutil   accudis 70.84   ɐkjuːɾɪl   ɐkjuːdiz 67.00 68.92
  bru   bruys 82.83   bɹuː   bɹaɪz 55.00 68.92
  bimbo   wimko 66.67   bɪmboʊ   wɪmkoʊ 71.00 68.83
  cazoo   carkoo 79.39   kæzuː   kɑːɹkuː 57.00 68.19
  doctolib   avocatlib 75.78   dɑːktəlɪb   ævəkætlɪb 60.00 67.89
  boss   kissboss 62.67   bɔs   kɪsbɔs 73.00 67.83
  bmw   bmv 74.61   biːɛmdʌbəljuː   biːɛmviː 61.00 67.81
  marca   plusmarca 57.35   mɑːɹkə   plʌsmɑːɹkə 78.00 67.68
  mdh   mhs 61.28   ɛmdiːeɪtʃ   ɛmeɪtʃɛs 74.00 67.64
  align   clickalign 60.17   ɐlaɪn   klɪkɐlaɪn 75.00 67.58
  ajona   avoma 68.00   ædʒoʊnə   ævoʊmə 67.00 67.50
  zara   zaraphora 75.44   zɑːɹɹə   zæɹɐfoːɹə 59.00 67.22
  levi's   levigo 76.83   lɛviz   lɛvɪɡoʊ 57.00 66.92
  zara   zareus 71.25   zɑːɹɹə   zɛɹəs 62.00 66.62
  zara   zareus 71.25   zɑːɹɹə   zɛɹəs 62.00 66.62
  naturli'   natureal 82.50   neɪɾɜːli   neɪtʃɚɹiəl 50.00 66.25
  moncler   northcler 70.29   mɔŋklɚ   nɔːɹθklɚ 62.00 66.14
  airbnb   airbrick 70.17   ɛɹbnb   ɛɹbɹɪk 62.00 66.08
  resolva   consolva 69.15   ɹᵻzɑːlvə   kənsɑːlvə 63.00 66.08
  sanytol   sanatio 78.83   sænɪtɑːl   sæneɪʃɪoʊ 53.00 65.92
  moncler   montec 73.39   mɔŋklɚ   mɔntɛk 57.00 65.19
  apiretal   a'peal 77.38   ɐpaɪɚɾəl   ɐpiːl 53.00 65.19
  very   veryco 86.67   vɛɹi   vɜːɹɪkoʊ 43.00 64.83
  bimbo   vibo 72.67   bɪmboʊ   viːboʊ 57.00 64.83
  head   headoniste 72.50   hɛd   hɛdəniːst 57.00 64.75
  saypha   shaype 73.50   seɪfə   ʃeɪp 55.00 64.25
  helios   delio 77.61   hɛlɪoʊz   dᵻliːoʊ 50.00 63.81
  coversyl   covixyl-v 69.94   kʌvɚsɪl   kɑːvɪksɪlviː 57.00 63.47
  simoniz   permanize 58.60   sɪmənɪz   pɜːmənaɪz 67.00 62.80
  vfh   vfhonline 67.22   viːɛfeɪtʃ   viːɛfhɑːnlaɪn 58.00 62.61
  rolex   dermarollex 49.03   ɹoʊlɛks   dɜːmɚɹoʊlɛks 76.00 62.52
  apple   alpineapple 62.89   æpəl   ælpɪniːpəl 62.00 62.45
  thermomix   zaubermix 63.19   θɜːməmɪks   zɔːbɚmɪks 60.00 61.59
  magnavox   multivox 58.33   mæɡnɐvɑːks   mʌltivɑːks 64.00 61.17
  nutella   mixitella 68.83   nuːtɛlə   mɪksaɪtɛlə 53.00 60.92
  airbnb   francebnb 59.65   ɛɹbnb   fɹænsɛbnb 62.00 60.82
  curve   crv 81.50   kɜːv   siːɑːɹviː 40.00 60.75
  gallo   rampingallo 52.52   ɡæloʊ   ɹæmpɪŋɡæloʊ 67.00 59.76
  iphone   mifon 62.50   aɪfoʊn   mɪfɑːn 57.00 59.75
  joy   bjoie 59.44   dʒɔɪ   bjɔɪ 60.00 59.72
  jd   jdyaoying 57.63   dʒeɪdiː   dʒeɪdaɪeɪɑːiɪŋ 61.00 59.31
  bally   ballyclare 78.50   bɔːli   bælɪklɛɹ 40.00 59.25
  swift   microswift 55.17   swɪft   maɪkɹoʊswɪft 63.00 59.08
  bloo   bluuwash 45.67   bluː   bluːwɑːʃ 71.00 58.33
  head   superhead 53.69   hɛd   suːpɚhɛd 62.00 57.84
  trek   gotrekfeel 68.50   tɹɛk   ɡɑːtɹɪkfiːl 47.00 57.75
  blippi   bbibbi 58.33   blɪpi   biːbɪbi 57.00 57.67
  immun44   immuno-19 73.70   ɪmʌn foːɹɾi foːɹ   ɪmjuːnoʊ naɪntiːn 40.00 56.85
  rolex   relxhome 57.17   ɹoʊlɛks   ɹᵻlkshoʊm 56.00 56.58
  kpn   opn 72.39   keɪpiːɛn   ɑːpən 40.00 56.19
  mc   macbeans 58.75   ɛmsiː   məkbiːnz 53.00 55.88
  ape   apecessories 61.25   eɪp   eɪpɪsɛsɚɹiz 50.00 55.62
  airbnb   marseillebnb 57.17   ɛɹbnb   mɑːɹseɪlɛbnb 53.00 55.08
  facebook   motherbook 60.08   feɪsbʊk   mʌðɚbʊk 50.00 55.04
  alaïa   azzaia 64.00   ɐlæiːə   æzeɪə 46.00 55.00
  puma   coma 58.33   puːmə   koʊmə 50.00 54.17
  bimbo   amorbimbi 55.17   bɪmboʊ   ɐmoːɹbɪmbaɪ 53.00 54.08
  azure   azurity 77.21   æʒɚ   æzjʊɹɹᵻɾi 29.00 53.11
  bimbo   binbokplay 65.83   bɪmboʊ   baɪnbɑːkpleɪ 40.00 52.92
  zara   zorazone 54.86   zɑːɹɹə   zoːɹɐzoʊn 47.00 50.93
  matters   m4tter 81.71   mæɾɚz   ɛm foːɹ tiːtɜː 19.00 50.36
  quirón   quiromasté 59.44   kwɜːɹɑːn   kwɪɹəmɐsteɪ 38.00 48.72
  joy   joïsta 55.33   dʒɔɪ   dʒɑːiːstə 40.00 47.67
  louboutin   lubov 61.74   laʊbaʊtɪn   luːbɑːv 33.00 47.37
  we   wecotton 60.00   wiː   wɛkəʔn̩ 33.00 46.50
  mcdonalds   mcsweet 44.13   məkdɑːnəldz   məkswiːt 48.00 46.07
  md   intimd 25.00   ɛmdiː   ɪntɪmdiː 67.00 46.00
  sane   cbdsane 36.50   seɪn   siːbiːdiːseɪn 53.00 44.75
  book   restaubook 28.50   bʊk   ɹᵻstaʊbʊk 57.00 42.75
  h10   motel 10 18.00   eɪtʃ tɛn   moʊtɛl tɛn 60.00 39.00
  coco   kokomarina 42.83   koʊkoʊ   kɑːkəmɚɹiːnə 30.00 36.42
  mi   lovmi 28.50   maɪ   lʌvmi 40.00 34.25

References

[1] https://circleid.com/posts/towards-a-quantitative-approach-for-objectively-measuring-the-similarity-of-marks

[2] https://bowmanslaw.com/insights/degrees-of-similarity-put-to-the-test/

[3] https://www.taylorwessing.com/en/insights-and-events/insights/2021/03/were-confused-how-the-general-court-decides-when-trade-marks-are-confusingly-similar

[4] https://guidelines.euipo.europa.eu/1803468/1787906/trade-mark-guidelines/3-5-conclusion-on-similarity

[5] https://circleid.com/pdf/similarity_measurement_of_marks_part_4.pdf

[6] https://pypi.org/project/fuzzywuzzy/

[7] https://rapidfuzz.github.io/Levenshtein/levenshtein.html#jaro-winkler

[8] M. Bernard and H. Titeux (2021). 'Phonemizer: Text to Phones Transcription for Multiple Languages in Python', J. Open Source Software, 6(68), p.3958.

[9] https://pypi.org/project/phonemizer/

[10] https://www.internationalphoneticassociation.org/content/ipa-chart

[11] https://circleid.com/posts/further-developing-a-word-mark-similarity-measurement-framework

[12] Stobbs CaseFest #16, London, 02-Oct-2024

This article was first published as a white paper on 17 October 2024 at:

https://circleid.com/pdf/similarity_measurement_of_marks_part_5.pdf

No comments:

Post a Comment

Unregistered Gems Part 6: Phonemizing strings to find brandable domains

Introduction The UnregisteredGems.com series of articles explores a range of techniques to filter and search through the universe of unregis...