Saturday, 19 October 2024

What degree of variability might be covered within a colour-mark protection framework?

Introduction

The concept of specific colours being protectable as brand-specific trademarks (or as components of broader or more complex marks) is now well-established, but colour-mark protection is not enormously robust, and lacks specific definition of the degree to which 'nearby' colours should also be protected (beyond the vague statement that protection should cover variants such that the difference between the shades is 'barely noticeable').

My recent series of articles on the subject[1,2,3,4] have outlined a series of potential definitions for specifying the similarity between colours, and have included suggestions as to how a more formalised protection framework could theoretically be constructed. In this framework, colours are specified according to their RGB (red/green/blue component) values, with each component expressed as an integer from 0 to 255, providing a colour 'universe' (ranging from [0,0,0] (black) to [255,255,255] (white)) of 16.8 million colours, which can be considered as points within a 3D colour space. From this, a geometric distance (d) (in RGB units) between any two colours can be calculated, from which a similarity score (Scol) can be defined. 

Furthermore, it was proposed that it might be appropriate for the protection for a specific colour (within appropriate goods and services classes) to cover not only that colour exactly, but also a sphere of points (representing nearby similar colours) surrounding it in colour space (to account for - for example - variations in printing and digital display processes), up to a specified radius (of the order of, say, d = 10 RGB units). The assertion is that a maximum distance of order 10 units would encompass minor variations, whilst still covering a space of points which are all nominally 'more-or-less the same colour'. 

In this article, I consider the visualisation of the degree of variability which would be encompassed by a framework of this nature. 

Analysis

As defined above, the set of points covered by a protected 'bubble' (sphere) of radius d would sit wholly inside a cube of side-length 2d, with the edges of the sphere just touching the central points of the faces of the cube (Figure 1). For d = 10 units, the sphere would contain approximately 4,189 (i.e. ⁴⁄₃ π 3) points (the number of individual protected colours), and the cube 8,000 (i.e. (2d)3) - i.e. the upper limit of the portion of colour space which would need to be searched to identify all protected variant - although the total numbers would be lower if the colour at the centre ([Rcentral,Gcentral,Bcentral]) was near an edge or corner of the overall colour space (as the components of any colour cannot be less than 0 or greater than 255). 

Figure 1: Schematic of a protected 'bubble' (sphere) (of radius d) of points within colour space

As an illustration, we can first consider a colour near the overall centre of the colour space (say, [128,128,128], a shade of mid-grey). In this case, there are actually 4,169 distinct colours within a  sphere of radius d = 10 units (noting that R, G and B can only take integer values) (i.e. ranging from [118,128,128], [128,118,128] and [128,128,118] to [138,128,128], [128,138,128] and [128,128,138]). The amount of variability contained within this set of colours is shown in Figure 2, in which (for convenience of manual review), the colours are sorted by their H (hue) values (i.e. the position within the spectrum of their dominant / 'base' colour, and neglecting saturation (intensity), value (darkness) and luminosity))[5]

Figure 2: Visualisations of the single colour [128,128,128] (left), and (shown as vertical bands) of the range of colours surrounding it up to a distance, d, of 10 RGB units (sorted by H (hue) values, left to right, then top to bottom) (right)

As mentioned above, for other colours near to (i.e. less than the distance d) an edge of the colour space, there will be a smaller number of possible variants contained within the (truncated) sphere of radius d = 10 units. The statistics and visualisations for some other basic colours are shown below in Table 1 and Figure 3.

Table 1: A selection of basic colours and the number of distinct colours in RGB space within a surrounding (truncated) sphere of radius d = 10 units in each case

Figure 3: Visualisations of the colours shown in Table 1 (left) and, in each case, (shown as vertical bands) the range of colours surrounding it up to a distance, d, of 10 RGB units (sorted by H (hue) values, left to right, then (where more than one row shown) top to bottom) (right)

Conclusions

The visualisations provided in this article provide an indication that the proposed value of d = RGB 10 units to encompass the protection offered by a colour mark registration, does seem to be reasonable, in terms of allowing a small amount of variability as may arise from brand- and product- production and visualisation processes, whilst still covering only a range of colours which subjectively appear nominally similar. The application of a quantitative framework along the lines suggested in these studies does offer the potential for objective comparisons between colour marks, and for the removal of subjective descriptions of degrees of difference. 

It is, however, important to note that the suggested value of 10 units is somewhat arbitrary, and would certainly be up for discussion if a specific value was to be adopted as part of a formalised protection framework. The association of colour with branding incorporates a number of psychological considerations - for example, a previous study by Kumar (2017)[6] found that colour increases brand recognition by 80%, and accounts for between 62% and 90% of a consumer's initial judgement of a product. Furthermore, recent comments by Lord Clement-Jones, following on from the Influence at Work / Stobbs study 'The Psychology of Lookalikes'[7], have highlighted the importance of considering psychological and behavioural analyses in IP disputes, particularly in relation to brand lookalikes[8]. Accordingly, if any such framework were to be adopted, it would likely require a foundation based on research into the impact of colour variations on subjective perceptions of brand association.

Finally, in order to construct an effective basis for a colour-mark protection framework, it would also be necessary to incorporate additional considerations. The degree of overlap of the areas of the goods and services of two potentially competing marks would be likely to be highly relevant, and the thresholds may need to vary depending on whether single colours or colour combinations were being protected, for example. 

References

[1] https://www.linkedin.com/pulse/measuring-similarity-marks-overview-suggested-ideas-david-barnett-zo7fe/

[2] https://circleid.com/pdf/similarity_measurement_of_marks_part_1.pdf

[3] https://circleid.com/posts/further-developing-a-colour-mark-similarity-measurement-framework-building-a-database

[4] https://circleid.com/pdf/similarity_measurement_of_marks_part_4.pdf

[5] https://circleid.com/pdf/similarity_measurement_of_marks_part_6.pdf

[6] https://www.semanticscholar.org/paper/The-Psychology-of-Colour-Influences-Consumers%E2%80%99-%E2%80%93-A-Kumar/f7c3b2a780a7a3bf907ef807085b86a63f0d8d0a?p2df

[7] https://www.iamstobbs.com/the-psychology-of-lookalikes

[8] https://www.linkedin.com/posts/geoff-steward-20404015_good-to-know-that-the-psychology-of-lookalikes-activity-7183447412542181377-6omH

This article was first published on 19 October 2024 at:

https://www.linkedin.com/pulse/what-degree-variability-might-covered-within-david-barnett-ajyoe/

Thursday, 17 October 2024

Measuring the similarity of marks: an overview of suggested ideas

Introduction

A comparison of the similarity between pairs of marks is a key component of many intellectual property disputes. A key point to note is that the overall assessment of similarity needs to take account of a number of components, several of which involve subjective determinations. These components might typically include: 'inherent' characteristics of the marks in question; the meaning of any terms (i.e. conceptual analysis); their distinctiveness, strength and degree of renown; the influence of any associated logos, imagery or mark stylisation; the degree of overlap of associated goods and services; documented evidence of actual confusion; and the degree of attention paid by a typical consumer - many of which may vary between different geographical regions. All of these factors contribute to the overall assessment of the likelihood of confusion between the marks. 

Nevertheless, there are certain characteristics (generally falling under the 'inherent' category referenced above) of some types of marks which do lend themselves to a quantitative, objective measurement of similarity. The most obvious such examples are colours, and the spelling and pronunciation of word marks (which contribute to visual and aural similarity, respectively). Whilst any measurement of such characteristics cannot provide a quantification of overall mark similarity, the associated algorithms can provide a useful tool to be utilised in the assessment process. 

Algorithms to measure colour similarity, and visual and aural similarity for word marks, have a number of obvious applications. Firstly, they offer the potential for greater consistency (and greater granularity) in the assessment of the respective types of similarity across dispute cases, and secondly, they offer the potential to be able to specify quantifiable thresholds up to which IP protection might apply. In addition, they have other applications, such as the option to post-process results from trademark watching services, to (better) sort and prioritise the findings and assist with the review process

A group of suggested frameworks for formulating algorithms of this type was set out in a series of six articles[1] recently published on the CircleID website. This overview presents a summary of the key ideas from the series.

Similarity of colours

Colours occupy a unique position in the set of mark types, due to the fact that the specification for a colour can be exactly defined. One of the most common frameworks (particularly in the context of digital display systems) is the RGB framework, where a colour is defined in terms of its red, green and blue components, each expressed as an integer value from 0 to 255 (giving 2563 or 16.8 million definable colours in total). This 'universe' of possible colours can therefore be visualised as a three-dimensional cube (or colour 'space'), with red increasing along one axis, green along another, and blue along the third, with each distinct colour occupying a unique point within the space.

Using this framework, the (degree of) difference between any two colours can be expressed in terms of their geometric distance (d) from each other in the colour space. From this, a difference score (Dcol) can be formulated (by expressing d as a proportion of the maximum possible distance between two colours - i.e. between [0,0,0] (black) and [255,255,255] (white)), and from this, a colour similarity score (Scol) (equal to 1 (or 100%) minus Dcol).

The concept of a colour similarity measurement metric makes most sense (in the context of disputes) if there were to exist a framework in which the protection granted under a colour mark (within appropriate categories of goods and services) registration covered not just that colour exactly, but also very similar colours (up to a specified threshold). Current guidelines suggest that protection should cover variants such that the difference between the shades is 'barely noticeable', but the use of a numerical score would provide the potential to put in place a more explicit threshold and avoid ambiguity. 

Within a framework of this type, it would also be possible (and might be convenient) to maintain a database of protected colours (or the colours of elements within broader protected marks), to help determine the existence of possible clashes between existing or proposed new protected colours. A mock-up of how this might look is shown in Table 1, for a series of colours associated with well-known brands. In the figure, the individual colours are sorted by their hue (H) values (part of an alternative (to RGB) framework for specifying colours), which orders them according to the position within the spectrum of their dominant colour, which can assist with visual review of data of this type.

Table 1: Mock-up of a database of protected colour marks, sorted by their H values

For context, the shades of orange used by Reese's and Home Depot (objectively the most similar pair of colours in the above table) have a similarity score of 95.9%.

Visual and aural similarity of word marks

For word marks (even just considering similarity in spelling (visual) and pronunciation (aural)), the situation is rather more complex. The frameworks suggested in the previous studies propose a separate similarity score for each of these two components (Svis and Saur, respectively), and an overall score (Swor) reflecting both types of word similarity, which is most simply calculated as the mean of the two components (but can be differently weighted if required).

The proposed algorithm for quantifying visual (spelling) similarity is itself composed of two components (i.e. utilises two distinct metrics), reflecting different aspects of the similarity in spelling. The first metric ('fuzz.ratio') is based on a measurement called Levenshtein distance, which quantifies the number of 'edits' (i.e. character insertions, deletions, or substitutions) necessary to transform one string into the other), but with the metric also including normalisation factors to take account of the length of the strings, and the second (Jaro-Winkler similarity) is more complex, including an element which takes account of the proximity of the variations to the start of the strings (where, for example, a consumer might be more likely to be aware of any differences). 

For aural (pronunciation) similarity, the calculation is carried out by first implementing an analysis process which converts each string to its IPA (International Phonetic Alphabet) representation, and then using the 'fuzz.ratio' metric to quantify the similarity between these representations.

For illustration, the similarity score values for a range of pairs of marks which were the subject of past disputes is shown in Table 2. 

Table 2: Pairs of marks and their visual, aural and overall similarity scores

Conclusion

The algorithms proposed for quantifying the similarity of colour marks, and the visual and aural similarity of word marks, do seem to perform reasonably well, and (in the case of the word mark metrics) aligns with what might be subjectively be reckoned according to manual analysis. The formulations are, of course, just one possible option, and it would certainly be possible to 'tune' the algorithms according to specific requirements.

Algorithms of this type do offer the potential for a more granular, continuous, repeatable and quantifiable expression of similarity and, with appropriate adoption into case analysis, offer a possible route towards greater consistency in dispute decisions. 

However, it is important to reiterate the statement made in the introduction, that such metrics cannot fully assess the overall similarity between marks, or replace the existing nuanced and multi-faceted approach of considering the full range of subjective factors which contribute to an assessment of the likelihood of confusion, but can provide a useful tool to be applied in such analyses and in other contexts.

Reference

[1] For colour marks:

For word marks:

This article was first published on 17 October 2024 at:

https://www.linkedin.com/pulse/measuring-similarity-marks-overview-suggested-ideas-david-barnett-zo7fe/

Further developing a colour mark similarity measurement framework - Part III: A method for sorting colours

Introduction

In my previous articles[1,2,3] looking at a framework for analysis of colour marks, I considered the use of the RGB definition for colours (specifying their individual red, green and blue components each as integer values between 0 and 255), and how this can be used to specify the 'distance' (d) between any two colours[4] and, equivalently, a similarity score (Scol)[5] for the pair. 

However, when considering colour marks, it can also be helpful to have an algorithm for sorting a list of colours into a convenient order for visual review. The obvious choice would be an ordering which resembles a spectrum of colours. This is distinct from a simple ordering based on just a sorting of (say) the numerical R values, followed by the G values and then the B values (as per Figure 1 in the previous article), which generates multiple, near-repeating series of coloured 'bands'.

The difficulty is that there is no simple algorithm for translating a 3D colour space into a (1D) linear series of colours in which all transitions are smooth and continuous. 

The situation can be appreciated by visualising the same set of 4,096 representative, 'regularly-spaced' colours as considered in the previous article (i.e. [8,8,8], [8,8,24], [8,8,40], … , [8,24,8], [8,24,24], [8,24,40], … , [24,8,8], … , [248,248,248]). 

Algorithms for sorting colours are frequently based on expression of the colours in HSV, rather than RGB, format[6]. This alternative representation also uses three components:

  • H (hue) - the 'base' colour (on a scale from 0 to 1 in 'spectral' order)
  • S (saturation) - the intensity of the colour
  • V (‘value’) - the darkness of the colour

Mathematical conversion of the RGB expression of a colour to its HSV equivalent involves a simple algorithm, and a number of pre-written library scripts[7] are available to implement it. 

The simplest method of sorting colours is just straightforwardly by their H (hue) values. For the set of 4,096 colours considered previously, this gives an ordering as shown in Figure 1 (left to right, then top to bottom).

Figure 1: Ordering of 4,096 colours occupying regularly-spaced positions in RGB space, according to their H (hue) values

What is less satisfactory about this ordering is that it takes no account of the other two parameters, so there are (for example) rapid alternations between dark and light shades, but there is no way to entirely smooth out these discontinuities without losing the smoothness of the transitions according to the other parameter(s). 

One other option is the use of an additional parameter, L (luminosity). (Perceived) luminosity can be derived directly from the RGB values of a colour[8]; on its own, it does not provide a good basis for sorting colours, but can be combined with the use of H to provide smoother transitions. One such option is to divide the H values into 'blocks' and then sort by L within each block. However, this still results in sharp transitions between adjacent colours at the ends of blocks, so does not really add much value in many cases. In the remainder of this article, therefore, sorting by (just) the H parameter is utilised.

Applications of colour sorting / ordering

The first point to note is that the H value (i.e. the position of the colour in an ordered spectrum) does not in itself provide the basis for an effective metric for comparing the similarity of colours (compared with the geometric distance (d) in colour space discussed previously), in part due to the disregarding of the other two components which affect a colour's visual appearance. 

As a related point, the relationship between hue (H) and colour distances (d) is complex, due to the distribution of colours in 3D space. As one illustration of this, it is instructive to visualise the numerical distance (d) of each of the 4,096 colours shown in Figure 1 from a fixed colour, as a function of the H value of the individual (variable) colour in each case. This relationship is shown in Figure 2, for three fixed colours: pure red ([255,0,0]), pure green ([0,255,0]) and pure blue ([0,0,255]).

Figure 2: Distances (d) of each of the 4,096 colours in Figure 1 from (pure) red, green and blue, as a function of their H (hue) values (0 = red; 1 = violet)

Figure 2 also reveals the 'circular' nature of the colour spectrum (when ordered according to hue), with both the 'red' and 'violet' ends of the spectrum 'close' to 'pure' red (i.e. [255,0,0]) - another reason why hue (H) is a less satisfactory basis (than d) for quantifying the proximity of colour pairs.

However, there are practical uses for sorting by H, predominantly where it is useful to be able to visually review sets of colours as part of the analysis process for marks (e.g. where assessing disputes and potential colour 'clashes').

For example, if maintaining a database of registered colour marks, it may be useful to have them sorted into a meaningful order, to be able visually review the proximity of similar marks and determine whether new proposed colour-marks are close enough to others to present a potential problem. For example, the set of colour marks considered in the 'Building a database' article in this series is again presented in Table 1, but here with the colours sorted by their H values, providing a much more preferable basis for manual review. (The table also illustrates how visually 'darker' shades are, in general, associated with lower L values.)

Table 1: Mock-up of a database of protected colour marks, sorted by their H values

In a second application, it might be helpful to be able to visualise (in an ordered form) the set of colours which are similar to a particular degree to another fixed colour, building on the idea of the similarity score (Scol) presented in the previous article.  

For example, taking an arbitrary colour somewhere near the centre of the colour space (say, [136,72,56], a shade of brown; Figure 3), it might be instructive to be able to visualise the set of colours (and the extent of their variability!) which would be deemed (by Scol) as being (for example) 75% similar (i.e. those colours sitting on the surface of a sphere in RGB space of appropriate radius - in this case, d = 110 RGB units - surrounding the colour in question). Such analyses might be informative in formulating guidelines regarding the thresholds up to which colour-mark protection might apply. Figure 4 shows the range of such colours, again sorted by H values, taken from the dataset of 4,096 colours considered previously. The examples range from [40,24,24], [232,24,24] and [232,104,104] (all H = 0.000) to [232,24,40] (H = 0.987).

Figure 3: A rectangle of colour RGB = [136,72,56]

Figure 4: Subset of the group of 4,096 colours occupying regularly-spaced positions in RGB space which are 75% (to the nearest percent) similar (according to Scol) to the colour [136,72,56], sorted by their H values

In summary, therefore, whilst the option for sorting colours into a meaningful order does not add much value to the framework for quantifying the degree of similarity between colours, it does provide a basis for being able to present colour information in a format which is more easily visually digestible. These ideas therefore could have applications in reviewing dispute cases, selecting options for new potential colour marks, and in formulating guidelines for IP protection thresholds.

References

[1] https://circleid.com/posts/towards-a-quantitative-approach-for-objectively-measuring-the-similarity-of-marks

[2] https://circleid.com/posts/further-developing-a-colour-mark-similarity-measurement-framework-building-a-database

[3] 'Further developing a colour mark similarity measurement framework - Part II: Defining a similarity score'

[4] d = √[(R1 – R2)2 + (G1 – G2)2 + (B1 – B2)2]

[5] Scol = 1 – [ d / √(3 × 2552) ]

[6] https://www.alanzucconi.com/2015/09/30/colour-sorting/

[7] e.g. https://github.com/python/cpython/blob/3.13/Lib/colorsys.py

def rgb_to_hsv(r, g, b):
    maxc = max(r, g, b)
    minc = min(r, g, b)
    rangec = (maxc-minc)
    v = maxc
    if minc == maxc:
        return 0.0, 0.0, v
    s = rangec / maxc
    rc = (maxc-r) / rangec
    gc = (maxc-g) / rangec
    bc = (maxc-b) / rangec
    if r == maxc:
        h = bc-gc
    elif g == maxc:
        h = 2.0+rc-bc
    else:
        h = 4.0+gc-rc
    h = (h/6.0) % 1.0
    return h, s, v

[8] L = √ [ 0.241 × R + 0.691 × G + 0.068 × B ]

This article was first published as a white paper on 17 October 2024 at:

https://circleid.com/pdf/similarity_measurement_of_marks_part_6.pdf

Further developing a word mark similarity measurement framework - Part II: Defining an improved similarity score

Introduction

My initial study on mark similarity measurement[1] focused on formulations for quantifying the objective similarity of pairs of marks, with particular focuses on colour- and word marks. As discussed in previous articles in this series, mark similarity assessment is a key part of the resolution of many intellectual property disputes, and a more objective approach could have a number of advantages, including the potential to provide definitions which could be built into case law, offer greater consistency across dispute decisions, and specify thresholds for IP protection.

However, it is important to reiterate the key point that any objective algorithms of these types should only ever be considered as tools to be used as part of the overall assessment process, which overall includes significant degrees of subjectivity. In the first instance, the algorithmic frameworks presented in this series for word marks focus only on visual (spelling) and aural (pronunciation - with a specific basis in American English) similarity, with no account taken of conceptual similarity (i.e. meaning) or the influence of any associated logos, imagery or mark stylisation. Overall, dispute decisions are often reliant on an assessment of the likelihood of confusion between the marks in question, which is generally also dependent on a range of other factors, including the distinctiveness, degree of overlap of associated goods and services, strength and degree of renown of the marks, documented evidence of actual confusion, and the degree of attention paid by a typical consumer - many of which may vary between different geographical regions[2,3]. Some of the factors generally considered for the components which can be measured algorithmically (such as typically putting greater weight on comparisons between elements appearing at the start of the marks in question, and greater emphasis on differences appearing within shorter marks[4]) can, and have, been built into the proposed algorithms wherever possible. 

The degree of similarity (of each type) between marks is often specified in dispute cases as 'high', 'medium' or 'low'; with this in mind, it seems reasonable (where constructing any measurement algorithm) to formulate the output as a similarity score (as proposed for colour marks in the previous article[5] in this series), which aligns broadly with this framework but offers a more quantitative basis for comparison (though keeping in mind that all of the above caveats also still apply!).

Formulation of the similarity score algorithm

The similarity score used for comparison of pairs of word marks (Swor), in both the previous study and this follow up, reflects both visual (spelling) and aural (pronunciation) similarity (only). 

As in the initial version, visual similarity between the marks (i.e. in terms of their spelling) is quantified using two distinct algorithms, each of which reflects different aspects of the similarity. The two algorithms (each of which generates a score which can be expressed as a percentage) are:

  • The fuzz.ratio metric (FLev), an algorithm implemented in the Python package 'fuzzywuzzy'[6], based on the concept of Levenshtein distance - a way of quantifying the number of edits required to transform one string into the other - but also taking account of other factors (including the length of the strings).
  • The Jaro-Winkler similarity algorithm (and score (simj)) (as implemented in the the Python package 'Levenshtein'[7]), which includes an element of consideration of the proximity of the matching / non-matching characters to the start of the strings. 

In the simplest formulation of the overall algorithm (and as retained here), the score component reflecting overall visual similarity (Svis) is expressed just as the simple mean of the above two scores (as below), although it would be possible to apply different weightings if required.

Svis = (FLev + simj) / 2

For aural similarity, the proposed calculation framework is based on the creation of a phonetic representation of the marks / strings in question, and then a comparison of these representations (again, using the fuzz.ratio metric). 

The initial formulation also made use of two distinct algorithms for generating the phonetic representations, based on the Soundex and NYSIIS (New York State Identification and Intelligence System) encodings. However, both of these have certain shortcomings, not least the poor handling of vowel sounds within the strings, and (in Soundex) the inability to encode any consonants beyond the first four.

In this improved version, therefore, I instead propose the use of the Phonemizer algorithm[8,9] for generating the phonetic versions of the strings, which utilises IPA (International Phonetic Alphabet)[10] encoding, and which was explored in the previous follow-up study[11] and appears to perform well (although some data 'cleansing' is required in some cases, to ensure that the algorithm interprets the string as intended). The aural similarity score (Saur) can then be calculated simply as the output of the fuzz.ratio metric applied to the IPA representations as given by Phonemizer, i.e.:

Saur = FPho

As in the previous formulation, the overall (word mark) similarity score can then most simply be expressed just as the mean of the two individual components, i.e.:

Swor = (Svis + Saur) / 2

Similarity scores for test-pairs of marks

As an illustration of the performance of this algorithm, I consider a set of approximately 200 pairs of word marks, mostly the subjects of recent trademark disputes (several of which were also considered in previous articles in this series), and with a primary focus on single-word marks (for simplicity). The full set of mark-pairs, and the calculated similarity scores, are presented in Appendix A.

The first point to note is that, generally, little pre-processing of the data is required in order to utilise the algorithm. All marks have been converted to lower-case, though this is generally a matter of choice, just to ensure that upper- and lower-case versions of the same letter are treated identically. The algorithms do also appear to correctly handle accented characters (albeit that the phonetic representations will generally reflect an English pronunciation). The only two modifications to the data required in these cases were a rewriting of 'OrangeryOS' as 'orangery-o-s' (to ensure that the pronunciation is rendered as 'oh-es') and (as in a previous study) of 'likeme' to 'like-me'. 

Elsewhere (as noted previously), the Phonemizer algorithm renders 'unreadable' strings as individual characters (e.g. 'immun44' as 'immun-four-four', '007' as 'zero-zero-seven', 'ch_t.' as 'see-aitch-tee', and 'mbfw' as 'em-bee-ef-doubleyu'), though these versions have been retained in an unmodified state in the analysis. Some of these representations may not be as originally intended when the marks were conceived, however - e.g. 'genv3rse' is rendered as 'genv-three-rse' (rather than the more likely 'genverse'), and 'm4tter' as 'em-four-tter' (rather than 'matter').

Overall, however, the algorithm does seem to provide a (subjectively) reasonable ranking of the mark-pairs by similarity. An attractive additional characteristic of this framework is that it is entirely repeatable, and unreliant on the number and types of pairs in the dataset (i.e. a particular word-pair will always give the same score), so it is always possible to compare like-with-like. Accordingly, it is instructive to consider some representative examples of word-pairs giving particular (approximate) scores (Swor), to provide a 'reckoner' of what the scores represent, i.e.:

  • Approx. 90%:
    • boss / bossi
    • billionaire / zillionaire
    • thermacare / thermocare
    • prinker / prink
    • intellicare / intelecare
    • chooey / chooee
    • mahendra / mahindra
  • Approx. 80%:
    • zara / zarzar
    • rabe / rase
    • retaron / retlron
    • createme / create.
    • spa / spato
    • thermomix / termomatrix
  • Approx. 70%:
    • kelio / kleeo
    • terry / terrissa
    • tygrys / tigris
    • nike / nuke
  • Approx. 60%:
    • nutella / mixitella
    • airbnb / francebnb
    • gallo / rampingallo
    • iphone / mifon
    • joy / bjoie
    • jd / jdyaoying
  • Approx. 50%:
    • zara / zorazone
    • quirón / quiromasté
  • Approx. 40%:
    • book / restaubook
    • h10 / motel 10

An additional attractive aspect of this approach is that it is also possible, if required, to consider the visual and aural similarity components separately. For example, the top pairs of marks by visual similarity score (Svis) (only) are fashiongo / fashionego (96.50%), configon / configo (95.25%) and casoria / castoria (95.04%), and by aural similarity score (Saur) (only) are sanytol / sanitol, testex / test-x, hobbit / hobbyt , kramer / cramer, kresco / cresco, and cylance / sylence (all 100%, i.e. deemed phonetically identical).

Discussion

Overall, (and again as noted previously) it would not be reasonable to expect any significant correlation between the similarity scores and the findings reached in the associated disputes, because of the significant additional (and subjective) points considered in the analysis, as discussed in the introduction to this article. For example, in the Initio / Vinicio case, the marks were found to have 'below average' visual similarity (despite the quantitative objective visual similarity score of 80.96%), with consideration having been given in the case to the differing impact of the various elements and the overall impression of the respective marks, which feature significant differences in visual presentation[12]

Nevertheless, the similarity score does offer a useful tool to consider the 'pure' visual and aural similarity (only) of the word marks, as part of an overall analysis (for example, in dispute cases), in a framework which is repeatable and qualitative, providing the potential for a consistent approach to assessment of these characteristics. It also aligns with the familiar terminological descriptions of 'degrees' of similarity, whilst offering a more granular and continuous scale. 

The algorithm does also offer additional possible use-cases, such as (for example) the ability to post-process the outputs from trademark watching services, so as to better sort the results by relevance (in cases where the sorting algorithm offered by the service performs less satisfactorily), and thereby aid in the review process.

It is also worth noting that there is also scope for possible future enhancements to the algorithms (some of which have been discussed previously), including (for example) assessments of the distinctiveness of the various elements or sub-elements (subsequences or substrings) of the marks, re-weighting the contribution of any trailing ‘s’, and so on. Distinctiveness and analysis of the 'types' of elements present in the marks may, in particular, be key to making a more meaningful overall assessment of similarity and, ultimately, likelihood of confusion. Relevant examples for consideration in the dataset include Cylance / Sylence (both 'clearly' allusions to the same common word ('silence')), Doctolib / Avocatlib (where the first portion of each mark makes reference to a profession), BMW / BMV (where the only difference is manifested as a pair of 'similar' letters), Immun44 / Immuno-19 (both featuring a similar root and, unusually, followed specifically by a number), iPhone / Mifon (with the similarity between 'I' and 'me' being of potential relevance), and Align / Clickalign (relevant because of the range of additional names cited by the latter party, suggesting the key point is the question of the distinctiveness of the term 'align' for the relevant goods and services).

Appendix A: Pairs of marks and their visual, aural and overall similarity scores

Mark 1
                                
Mark 2
                                
Vis. sim. score
(
Svis)
                                
Mark 1 (IPA)
                                
Mark 2 (IPA)
                                
Aur. sim. score
(
Saur)
                                
Overall word mark sim. score
(
Swor)
  casoria   castoria 95.04   kæsoːɹiə   kæstoːɹiə 95.00 95.02
  sanytol   sanitol 89.67   sænɪtɑːl   sænɪtɑːl 100.00 94.83
  testex   test-x 88.17   tɛstɛks   tɛstɛks 100.00 94.08
  hobbit   hobbyt 88.17   hɑːbɪt   hɑːbɪt 100.00 94.08
  replay   re:play 94.10   ɹiːpleɪ   ɹiː pleɪ 94.00 94.05
  kramer   cramer 85.94   kɹeɪmɚ   kɹeɪmɚ 100.00 92.97
  kresco   cresco 85.94   kɹɛskoʊ   kɹɛskoʊ 100.00 92.97
  cintra   citra 93.28   sɪntɹə   sɪtɹə 92.00 92.64
  dekton   deton 93.28   dɛktən   dɛtən 92.00 92.64
  free   freen 92.50   fɹiː   fɹiːn 91.00 91.75
  goddess   godless 89.67   ɡɑːdəs   ɡɑːdləs 93.00 91.33
  boss   bossi 92.50   bɔs   bɔsi 89.00 90.75
  billionaire   zillionaire 92.47   bɪliənɛɹ   zɪliənɛɹ 89.00 90.73
  thermacare   thermocare 91.89   θɜːmɐkɛɹ   θɜːməkɛɹ 89.00 90.44
  prinker   prink 88.64   pɹɪŋkɚ   pɹɪŋk 92.00 90.32
  intellicare   intelecare 90.18   ɪntɛlɪkɛɹ   ɪntɛlᵻkɛɹ 90.00 90.09
  chooey   chooee 88.17   tʃuːi   tʃuːiː 92.00 90.08
  dcsl   dcs 90.08   diːsiːɛsɛl   diːsiːɛs 90.00 90.04
  mahendra   mahindra 91.08   mæhɛndɹə   mæhɪndɹə 89.00 90.04
  lucite   luci 86.67   luːsaɪt   luːsaɪ 93.00 89.83
  george   georgine 90.50   dʒɔːɹdʒ   dʒɔːɹdʒɪn 89.00 89.75
  tropico   tropicazo 91.78   tɹɑːpɪkoʊ   tɹɑːpɪkɑːzoʊ 87.00 89.39
  demiegod   demigods 91.50   dɛmɪeɪɡɑːd   dɛmɪɡɑːdz 86.00 88.75
  mbet   m-bets 85.00   ɛmbɛt   ɛmbɛts 92.00 88.50
  fashiongo   fashionego 96.50   fæʃəŋɡoʊ   fæʃəniːɡoʊ 80.00 88.25
  cylance   sylence 75.98   saɪləns   saɪləns 100.00 87.99
  ping   pingke 86.67   pɪŋ   pɪŋk 89.00 87.83
  pikdare   pi-kare 89.19   pɪkdɛɹ   paɪkɛɹ 86.00 87.60
  mbfw   mvfw 80.00   ɛmbiːɛfdʌbəljuː   ɛmviːɛfdʌbəljuː 94.00 87.00
  joy   joyme 82.83   dʒɔɪ   dʒɔɪm 91.00 86.92
  configon   configo 95.25   kənfɪɡən   kənfɪɡoʊ 78.00 86.62
  prinz   prinse 81.17   pɹɪnts   pɹɪns 92.00 86.58
  lovello   lovelle 90.14   lʌvloʊ   lʌvl 83.00 86.57
  energeo   enerjo 83.98   ɛnɚdʒeɪoʊ   ɛnɚdʒoʊ 89.00 86.49
  trucool   turcool 90.86   tɹuːkuːl   tɜːkuːl 82.00 86.43
  carbon   mycarbon 88.83   kɑːɹbən   maɪkɑːɹbən 84.00 86.42
  consiglieri   consigliera 93.68   kənsɪɡlɪɹi   kənsɪɡliɛɹə 78.00 85.84
  starbucks   charbucks 81.59   stɑːɹbʌks   tʃɑːɹbʌks 90.00 85.80
  realme   realmz 88.17   ɹɛlmi   ɹɛlmz 83.00 85.58
  axis   traxis 84.44   æksɪs   tɹæksɪs 86.00 85.22
  youtube   u-tubes 75.98   juːtuːb   juːtuːbz 94.00 84.99
  bimbo   gimbo 83.33   bɪmboʊ   ɡɪmboʊ 86.00 84.67
  tiktok   tiktaktok 85.00   tɪktɑːk   tɪktɐktɑːk 84.00 84.50
  z-biome   biome 86.74   ziːbaɪoʊm   baɪoʊm 82.00 84.37
  bacchus   cacchus 85.46   bækəs   kækəs 83.00 84.23
  philips   philzops 86.07   fɪlɪps   fɪlzəps 80.00 83.04
  patter   yatter 85.94   pæɾɚ   jæɾɚ 80.00 82.97
  noughty   naughtea 73.59   nɔːɾi   nɔːɾiə 92.00 82.79
  yorxs   yorks 85.33   joːɹksz   jɔːɹks 80.00 82.67
  jarlsberg   jørnsberg 82.33   dʒɑːɹlsbɜːɡ   dʒoːɹnsbɜːɡ 83.00 82.67
  globe-trotter   globetrotter xc 90.23   ɡloʊbtɹɑːɾɚ   ɡloʊbtɹɑːɾɚɹ ɛkssiː 75.00 82.62
  treca   trea 92.17   tɹɛkə   tɹiə 73.00 82.58
  resolution   resolute 84.75   ɹɛzəluːʃən   ɹɛzəluːt 80.00 82.38
  olympéa   olympe 83.98   əlɪmpeɪə   əlɪmp 80.00 81.99
  ellesse   elliss 83.22   ɛlɛs   ɛlɪs 80.00 81.61
  hugo   hug-o 92.17   hjuːɡoʊ   hʌɡoʊ 71.00 81.58
  initio   vinicio 80.96   ɪnɪɾɪoʊ   vɪnɪsɪoʊ 82.00 81.48
  bimbo   bimbolea 84.75   bɪmboʊ   baɪmboʊliə 78.00 81.38
  burgerme   burgerly 82.50   bɜːɡɚm   bɜːɡɚli 80.00 81.25
  1link   link 91.17   wʌn lɪŋk   lɪŋk 71.00 81.08
  repevax   epvax 86.74   ɹᵻpɛvæks   ɛpvæks 75.00 80.87
  free   freepour 78.50   fɹiː   fɹiːpɚ 83.00 80.75
  zara   zarzar 86.11   zɑːɹɹə   zɑːɹzɑːɹ 75.00 80.56
  rabe   rase 80.83   ɹeɪb   ɹeɪz 80.00 80.42
  retaron   retlron 89.67   ɹᵻtæɹən   ɹᵻtlɹɑːn 71.00 80.33
  createme   create. 86.07   kɹiːeɪɾiːm   kɹiːeɪt 74.00 80.04
  spa   spato 82.83   spɑː   spɑːɾoʊ 77.00 79.92
  thermomix   termomatrix 84.24   θɜːməmɪks   tɜːməmeɪtɹɪks 75.00 79.62
  atma   atmaspa 82.21   ætmə   ætmæspə 77.00 79.61
  live   vive 79.17   laɪv   vaɪv 80.00 79.58
  cana   canya 92.17   kɑːnə   kænjə 67.00 79.58
  l'oreal   joreal 80.96   ɛloːɹiəl   dʒoːɹiəl 78.00 79.48
  seiko   seycos 65.50   seɪkoʊ   seɪkoʊz 93.00 79.25
  pockit   mypocket 76.47   pɑːkɪt   maɪpɑːkɪt 82.00 79.24
  bisleri   bilseri 91.10   baɪslɜːɹi   bɪlsɚɹi 67.00 79.05
  kikkoman   kikomand 91.08   kɪkɑːmən   kɪkəmænd 67.00 79.04
  fido   fiio 80.83   faɪdoʊ   fɪɪoʊ 77.00 78.92
  waken   wakeful 77.21   weɪkən   weɪkfəl 80.00 78.61
  nutravita   nootrovita 79.17   nʌtɹɐviːɾə   nuːtɹəviːɾə 78.00 78.58
  um bongo   ubongo! 84.11   ʌm bɑːŋɡoʊ   juːbɑːŋɡoʊ 73.00 78.55
  pyra   prya 83.75   pɪɹə   pɹaɪə 73.00 78.38
  ulma   luma 83.33   ʌlmə   luːmə 73.00 78.17
  fransa   fanza 78.50   fɹænsə   fænzə 77.00 77.75
  chef   chefchy 82.21   ʃɛf   ʃɛftʃi 73.00 77.61
  boss   bossvel 82.21   bɔs   bɔsvəl 73.00 77.61
  hanson   hansol 88.17   hænsən   hænsɑːl 67.00 77.58
  lucozade   glucos-aid 72.67   luːkəzeɪd   ɡluːkoʊzeɪd 82.00 77.33
  asos   asas 80.83   ɐsoʊz   ɐsæz 73.00 76.92
  iqos   niccos 67.50   aɪkoʊz   nɪkoʊz 86.00 76.75
  zemo   zoomo 67.11   ziːmoʊ   zuːmoʊ 86.00 76.56
  hyprr   hypernft 72.83   haɪpɚ   haɪpɚnft 80.00 76.42
  free   freeyoung 75.44   fɹiː   fɹiːjʌŋ 77.00 76.22
  bimbo   bimbys 81.17   bɪmboʊ   bɪmbiz 71.00 76.08
  uber   youber 84.44   juːbɚ   jaʊbɚ 67.00 75.72
  dune   dne 89.25   duːn   diːɛniː 62.00 75.62
  scaffeze   scaffx 80.08   skæfɛz   skæfks 71.00 75.54
  foltene   foltex 83.98   foʊltiːn   foʊltɛks 67.00 75.49
  abanca   abaca 93.56   ɐbæŋkə   æbɑːkə 57.00 75.28
  ch   ch_t. 70.50   siːeɪtʃ   siːeɪtʃ tiː 80.00 75.25
  suntech   suntank 69.93   sʌntɛk   sʌntæŋk 80.00 74.96
  hotpatch   patch 78.92   hɑːtpætʃ   pætʃ 71.00 74.96
  huracán   huracanrace 77.53   hjʊɹɹɐkɑːn   hjʊɹɹɐkænɹeɪs 72.00 74.76
  free   freetalk 78.50   fɹiː   fɹiːɾɔːk 71.00 74.75
  free   freeloop 78.50   fɹiː   fɹiːluːp 71.00 74.75
  intelect   entelec 77.90   ɪntɛlᵻkt   ɛntɛlɛk 71.00 74.45
  maplab   maplab.world 78.50   mæplæb   mæplæb wɜːld 70.00 74.25
  sacher   sachi 81.17   sæʃɚ   sætʃaɪ 67.00 74.08
  fanta   fantarifa 81.06   fæntə   fæntɑːɹɹɪfə 67.00 74.03
  fiorelli   fioretto 73.50   fɪoːɹɛli   fɪoːɹɛɾoʊ 74.00 73.75
  sherco   charco 72.39   ʃɜːkoʊ   tʃɑːɹkoʊ 75.00 73.69
  vidas   vidya 85.33   viːdəz   vɪdɪə 62.00 73.67
  gobox   g-box 84.00   ɡoʊbɑːks   dʒiːbɑːks 63.00 73.50
  idee   idee-home 75.44   ɪdiː   ɪdiːhoʊm 71.00 73.22
  starbucks   sardarbuksh 76.21   stɑːɹbʌks   sɑːɹdɑːɹbʌkʃ 70.00 73.11
  orange   orangery-o-s 78.50   ɔɹɪndʒ   ɔɹɪndʒɚɹioʊɛs 67.00 72.75
  free   freeyond 78.50   fɹiː   fɹiːjɑːnd 67.00 72.75
  free   freepods 78.50   fɹiː   fɹiːpɑːdz 67.00 72.75
  sanytol   savisol 67.07   sænɪtɑːl   sævɪsɑːl 78.00 72.54
  snuggledown   snugglemore 81.05   snʌɡəldaʊn   snʌɡəlmoːɹ 64.00 72.52
  pez   pezeeu 77.67   pɛz   pɛziːuː 67.00 72.33
  zirco   cozirc 77.61   zɜːkoʊ   kɑːzɜːk 67.00 72.31
  glenfiddich   inverfiddich 74.10   ɡlɛnfɪdɪtʃ   ɪnvɜːfɪdɪtʃ 70.00 72.05
  salio   saliogen 84.75   sælɪoʊ   sælɪədʒən 59.00 71.88
  vallformosa   fermosa 70.77   vælfoːɹmoʊsə   fɜːmoʊsə 73.00 71.88
  noughty   nouti 76.17   nɔːɾi   naʊɾi 67.00 71.58
  tesla   teslapimp 81.06   tɛslə   tɛslɐpɪmp 62.00 71.53
  live   life's 70.00   laɪv   laɪfz 73.00 71.50
  e-bulli   bullit 80.96   iːbʊli   bʊlɪt 62.00 71.48
  bimbo   bims 75.92   bɪmboʊ   bɪmz 67.00 71.46
  genie   genai 85.33   dʒiːni   dʒɛnaɪ 57.00 71.17
  lakme   like-me 70.32   lækmi   laɪkmiː 71.00 70.66
  kelio   kleeo 70.25   kɛlɪoʊ   kliːoʊ 71.00 70.62
  terry   terrissa 74.00   tɛɹi   tɛɹɪsə 67.00 70.50
  tygrys   tigris 73.50   tɪɡɹiz   taɪɡɹɪs 67.00 70.25
  nike   nuke 80.00   naɪk   nuːk 60.00 70.00
  007   skx007 58.50   ziəɹoʊziəɹoʊ sɛvən   ɛskeɪɛks ziəɹoʊziəɹoʊ sɛvən 81.00 69.75
  geneverse   genv3rse 85.28   dʒɛnɪvɜːs   dʒɛnv θɹiː ɑːɹɹɛsiː 53.00 69.14
  lego   solego 76.11   lɛɡoʊ   sɑːliːɡoʊ 62.00 69.06
  perry   perryhome 81.06   pɛɹi   pɛɹɪhoʊm 57.00 69.03
  kadawe   kademae 80.89   kædɔː   keɪdmiː 57.00 68.94
  acutil   accudis 70.84   ɐkjuːɾɪl   ɐkjuːdiz 67.00 68.92
  bru   bruys 82.83   bɹuː   bɹaɪz 55.00 68.92
  bimbo   wimko 66.67   bɪmboʊ   wɪmkoʊ 71.00 68.83
  cazoo   carkoo 79.39   kæzuː   kɑːɹkuː 57.00 68.19
  doctolib   avocatlib 75.78   dɑːktəlɪb   ævəkætlɪb 60.00 67.89
  boss   kissboss 62.67   bɔs   kɪsbɔs 73.00 67.83
  bmw   bmv 74.61   biːɛmdʌbəljuː   biːɛmviː 61.00 67.81
  marca   plusmarca 57.35   mɑːɹkə   plʌsmɑːɹkə 78.00 67.68
  mdh   mhs 61.28   ɛmdiːeɪtʃ   ɛmeɪtʃɛs 74.00 67.64
  align   clickalign 60.17   ɐlaɪn   klɪkɐlaɪn 75.00 67.58
  ajona   avoma 68.00   ædʒoʊnə   ævoʊmə 67.00 67.50
  zara   zaraphora 75.44   zɑːɹɹə   zæɹɐfoːɹə 59.00 67.22
  levi's   levigo 76.83   lɛviz   lɛvɪɡoʊ 57.00 66.92
  zara   zareus 71.25   zɑːɹɹə   zɛɹəs 62.00 66.62
  zara   zareus 71.25   zɑːɹɹə   zɛɹəs 62.00 66.62
  naturli'   natureal 82.50   neɪɾɜːli   neɪtʃɚɹiəl 50.00 66.25
  moncler   northcler 70.29   mɔŋklɚ   nɔːɹθklɚ 62.00 66.14
  airbnb   airbrick 70.17   ɛɹbnb   ɛɹbɹɪk 62.00 66.08
  resolva   consolva 69.15   ɹᵻzɑːlvə   kənsɑːlvə 63.00 66.08
  sanytol   sanatio 78.83   sænɪtɑːl   sæneɪʃɪoʊ 53.00 65.92
  moncler   montec 73.39   mɔŋklɚ   mɔntɛk 57.00 65.19
  apiretal   a'peal 77.38   ɐpaɪɚɾəl   ɐpiːl 53.00 65.19
  very   veryco 86.67   vɛɹi   vɜːɹɪkoʊ 43.00 64.83
  bimbo   vibo 72.67   bɪmboʊ   viːboʊ 57.00 64.83
  head   headoniste 72.50   hɛd   hɛdəniːst 57.00 64.75
  saypha   shaype 73.50   seɪfə   ʃeɪp 55.00 64.25
  helios   delio 77.61   hɛlɪoʊz   dᵻliːoʊ 50.00 63.81
  coversyl   covixyl-v 69.94   kʌvɚsɪl   kɑːvɪksɪlviː 57.00 63.47
  simoniz   permanize 58.60   sɪmənɪz   pɜːmənaɪz 67.00 62.80
  vfh   vfhonline 67.22   viːɛfeɪtʃ   viːɛfhɑːnlaɪn 58.00 62.61
  rolex   dermarollex 49.03   ɹoʊlɛks   dɜːmɚɹoʊlɛks 76.00 62.52
  apple   alpineapple 62.89   æpəl   ælpɪniːpəl 62.00 62.45
  thermomix   zaubermix 63.19   θɜːməmɪks   zɔːbɚmɪks 60.00 61.59
  magnavox   multivox 58.33   mæɡnɐvɑːks   mʌltivɑːks 64.00 61.17
  nutella   mixitella 68.83   nuːtɛlə   mɪksaɪtɛlə 53.00 60.92
  airbnb   francebnb 59.65   ɛɹbnb   fɹænsɛbnb 62.00 60.82
  curve   crv 81.50   kɜːv   siːɑːɹviː 40.00 60.75
  gallo   rampingallo 52.52   ɡæloʊ   ɹæmpɪŋɡæloʊ 67.00 59.76
  iphone   mifon 62.50   aɪfoʊn   mɪfɑːn 57.00 59.75
  joy   bjoie 59.44   dʒɔɪ   bjɔɪ 60.00 59.72
  jd   jdyaoying 57.63   dʒeɪdiː   dʒeɪdaɪeɪɑːiɪŋ 61.00 59.31
  bally   ballyclare 78.50   bɔːli   bælɪklɛɹ 40.00 59.25
  swift   microswift 55.17   swɪft   maɪkɹoʊswɪft 63.00 59.08
  bloo   bluuwash 45.67   bluː   bluːwɑːʃ 71.00 58.33
  head   superhead 53.69   hɛd   suːpɚhɛd 62.00 57.84
  trek   gotrekfeel 68.50   tɹɛk   ɡɑːtɹɪkfiːl 47.00 57.75
  blippi   bbibbi 58.33   blɪpi   biːbɪbi 57.00 57.67
  immun44   immuno-19 73.70   ɪmʌn foːɹɾi foːɹ   ɪmjuːnoʊ naɪntiːn 40.00 56.85
  rolex   relxhome 57.17   ɹoʊlɛks   ɹᵻlkshoʊm 56.00 56.58
  kpn   opn 72.39   keɪpiːɛn   ɑːpən 40.00 56.19
  mc   macbeans 58.75   ɛmsiː   məkbiːnz 53.00 55.88
  ape   apecessories 61.25   eɪp   eɪpɪsɛsɚɹiz 50.00 55.62
  airbnb   marseillebnb 57.17   ɛɹbnb   mɑːɹseɪlɛbnb 53.00 55.08
  facebook   motherbook 60.08   feɪsbʊk   mʌðɚbʊk 50.00 55.04
  alaïa   azzaia 64.00   ɐlæiːə   æzeɪə 46.00 55.00
  puma   coma 58.33   puːmə   koʊmə 50.00 54.17
  bimbo   amorbimbi 55.17   bɪmboʊ   ɐmoːɹbɪmbaɪ 53.00 54.08
  azure   azurity 77.21   æʒɚ   æzjʊɹɹᵻɾi 29.00 53.11
  bimbo   binbokplay 65.83   bɪmboʊ   baɪnbɑːkpleɪ 40.00 52.92
  zara   zorazone 54.86   zɑːɹɹə   zoːɹɐzoʊn 47.00 50.93
  matters   m4tter 81.71   mæɾɚz   ɛm foːɹ tiːtɜː 19.00 50.36
  quirón   quiromasté 59.44   kwɜːɹɑːn   kwɪɹəmɐsteɪ 38.00 48.72
  joy   joïsta 55.33   dʒɔɪ   dʒɑːiːstə 40.00 47.67
  louboutin   lubov 61.74   laʊbaʊtɪn   luːbɑːv 33.00 47.37
  we   wecotton 60.00   wiː   wɛkəʔn̩ 33.00 46.50
  mcdonalds   mcsweet 44.13   məkdɑːnəldz   məkswiːt 48.00 46.07
  md   intimd 25.00   ɛmdiː   ɪntɪmdiː 67.00 46.00
  sane   cbdsane 36.50   seɪn   siːbiːdiːseɪn 53.00 44.75
  book   restaubook 28.50   bʊk   ɹᵻstaʊbʊk 57.00 42.75
  h10   motel 10 18.00   eɪtʃ tɛn   moʊtɛl tɛn 60.00 39.00
  coco   kokomarina 42.83   koʊkoʊ   kɑːkəmɚɹiːnə 30.00 36.42
  mi   lovmi 28.50   maɪ   lʌvmi 40.00 34.25

References

[1] https://circleid.com/posts/towards-a-quantitative-approach-for-objectively-measuring-the-similarity-of-marks

[2] https://bowmanslaw.com/insights/degrees-of-similarity-put-to-the-test/

[3] https://www.taylorwessing.com/en/insights-and-events/insights/2021/03/were-confused-how-the-general-court-decides-when-trade-marks-are-confusingly-similar

[4] https://guidelines.euipo.europa.eu/1803468/1787906/trade-mark-guidelines/3-5-conclusion-on-similarity

[5] https://circleid.com/pdf/similarity_measurement_of_marks_part_4.pdf

[6] https://pypi.org/project/fuzzywuzzy/

[7] https://rapidfuzz.github.io/Levenshtein/levenshtein.html#jaro-winkler

[8] M. Bernard and H. Titeux (2021). 'Phonemizer: Text to Phones Transcription for Multiple Languages in Python', J. Open Source Software, 6(68), p.3958.

[9] https://pypi.org/project/phonemizer/

[10] https://www.internationalphoneticassociation.org/content/ipa-chart

[11] https://circleid.com/posts/further-developing-a-word-mark-similarity-measurement-framework

[12] Stobbs CaseFest #16, London, 02-Oct-2024

This article was first published as a white paper on 17 October 2024 at:

https://circleid.com/pdf/similarity_measurement_of_marks_part_5.pdf

Unregistered Gems Part 6: Phonemizing strings to find brandable domains

Introduction The UnregisteredGems.com series of articles explores a range of techniques to filter and search through the universe of unregis...