David Barnett's Brand Protection Articles: Further developing a word mark similarity measurement framework

EXECUTIVE SUMMARY

Following my previous article outlining a suggested framework for objectively quantifying mark similarity^[1], this follow-up study looks further at the algorithms proposed for use with word marks and explores some possible enhancements.

In Part 1, I consider the degree to which the similarity metric is consistent with assessments provided by other sources, namely the (subjective) decisions from recent UK trademark dispute cases, and the results from a trademark similarity search tool.

Part 2 considers the use of analysis of subsequences and substrings between pairs of marks, and proposes an additional metric for quantifying similarity, based on the proportions of the marks which are common to each other.

Part 3 explores the use of a tool for converting strings into their phonetic representations using International Phonetic Alphabet (IPA) syntax, as a means for assessing the aural (pronunciation) similarity between marks.

These ideas could be incorporated into a framework of greater sophistication for measuring the similarity of marks in a quantitative sense. Whilst in general there remains significant subjectivity in the relevant legal tests, an objective framework offers a tool allowing for the possibility of greater consistency if incorporated into relevant arguments, frameworks and - ultimately - case law.

Reference

[1] https://circleid.com/posts/towards-a-quantitative-approach-for-objectively-measuring-the-similarity-of-marks

This article was first published on 9 October 2024 at:

https://circleid.com/posts/further-developing-a-word-mark-similarity-measurement-framework

* * * * *

WHITE PAPER

Part 1 - Testing the algorithm

Introduction

In my introductory article on mark similarity measurement^[1], I presented an initial version of a suggested algorithm for objectively quantifying the degree of similarity of a pair of word marks. Similarity assessment is a key element of the decision-making process in many trademark disputes, relating to the likelihood of competing marks generating confusion with each other.

The methodology proposed in the previous study makes use of four calculated components: two of these^[2] generate scores (between 0 and 100) indicating the degree of visual (i.e. spelling) similarity; the other two^[3] quantify the degree of aural (i.e. pronunciation) similarity. These two similarity types can therefore be quantified separately, or can be combined to create an overall similarity metric (S) for the two marks^[4]. The basic metrics take no account of the meaning of the terms (i.e. conceptual similarity) or their level of distinctiveness (which is more relevant to the determination of likelihood of confusion), and also ignore any associated classes of goods and services, any logos, imagery or fonts (and the case (i.e. upper vs lower) of the marks).

In this first part of the follow-up, I compare the outputs of the similarity metric with those from other contexts in which similarity is assessed. Throughout the article, I use the term 'inconsistency' to refer to any deviation from the scenario in which a fixed 'decision tree' (i.e. the situation where, all other factors being equal, an objectively equivalent input will produce the same output) can be seen to have been applied. It is important to note, however, that in real dispute cases, the (manual) analysis will be highly nuanced and following established practice, so different decisions may (correctly) be reached in similar cases. Specific details of individual cases are not, however, considered further in this analysis.

Analysis and testing - Case study 1: Recent UK trademark dispute case decisions

As an initial test of the meaningfulness of the similarity metrics, it is informative to compare their output against the point-of-law decisions from a series of recent trademark disputes. The dataset utilises only word mark vs word mark cases from UK courts (in part, because some of the algorithms are 'tuned' to be most appropriate for English-language content) in the last year, comprising 243 cases. The data^[5] gives, for each case, the decision on whether the pairs of marks were deemed to be similar or different, according to each of four similarity types (aural, visual, conceptual, overall) (see Appendix A), but does not (in the summary overview) give more granular information (e.g. on the assessed degree of similarity in each case).

The first point to note is that - looking purely at the data as presented, with no deeper analysis of case background or context - the decisions include significant degrees of subjectivity, and the decision-making process is not consistent throughout. Arguably, this is why a more objective approach could add value to the process of decision-making, but it does seem unavoidable that some subjective elements will always be involved.

Examples of the inconsistencies in the decisions across the dataset (beyond just cases where different decisions may have been reached on the same case at different times) are outlined below.

Some pairs of marks which are clearly different if viewed in their entirety have been assessed to be similar, suggesting that the main consideration in these cases has been on (only) the primary distinctive element of the mark. Examples include: Dose & Co. vs Dose Labs ('Dose'); Catapult Vision / Catapult One vs Catapult Consulting ('Catapult'); Alpha Boxing vs Alpha Force ('Alpha'), etc. - though this is not always the case. Furthermore, in some cases where one mark is entirely contained within the other, the pair have been deemed to be similar (e.g. Amazon vs Amazon Food Trader Ltd; Very vs Veryco; Life vs ChopLife / Chop Life; Carbon vs MyCarbon; PXG Pharma vs PXG; Bad Boy vs BBCC / Bad Boy Chiller Crew) and, in some cases, different (in at least either an aural or visual sense) (e.g. Yoga vs Yoga Man; Ghost vs Ghostdancer CBD; Purple Computing / Purple vs Purplecube). Part of the rationale in some of these cases seems to have been the disregarding of non-distinctive terms (e.g. 'Co', 'My', etc.), but this also is not consistent, and requires a priori assumptions.

It might be reasonable to expect it not to be possible for a pair of marks to be similar (overall) if they differ according to at least one of the similarity types, but there are a number of exceptions to this (e.g. Boss vs Bossvel, Goddess vs Godless, VFH vs VFHOnline, Folium Science vs Folium, etc. - all deemed conceptually different, but similar overall; Sky / Sky X vs Ysky - deemed aurally and visually different, but similar overall). There is also no consistent application of 'rules' here - e.g. Matters vs M4tter and MFlor vs HFlor / H Flor were both considered to be aurally and visually similar but conceptually different, but the first pair was deemed similar overall and the second pair different.

There are also a number of other case-specific inconsistencies - for example, in two cases where the contested marks were almost identical between the cases (X12 / X14 / X16 / X15 / X13 vs vivo X15 / vivo X12 / vivo X14 / vivo X16 / vivo X13 and 6X / 7X / 9X / 8X / 5X vs vivo X6 / vivo X5 / vivo X7 / vivo X9 / vivo X8), they were found aurally and visually similar in one case, and different in the other.

With these inconsistencies in mind, it is difficult to formulate a test of the similarity algorithm(s) to determine whether it produces an output consistent with the case law. Furthermore, this would arguably not really be the desired behaviour for any such algorithm designed to produce a consistent, objective output.

The simplest test is to consider only those cases where both of the contested marks are a single word (to avoid having to making any judgement as to which terms within a multi-word mark are primarily being assessed). There are 90 such cases within the database.

Figures 1 and 2 show the visual and aural similarity scores^[6] (respectively) for the pairs of single-word marks assessed in the case decisions as being similar (shown in green) or different (shown in red) according to the same criterion / similarity type.

Figure 1: Visual similarity scores for the pairs of single-word marks assessed in the case decisions as being visually similar (green) or different (red), as a function of the average of the lengths of the two marks in each case

Figure 2: Aural similarity scores for the pairs of single-word marks assessed in the case decisions as being aurally similar (green) or different (red), as a function of the average of the lengths of the two marks in each case

The analysis shows that there is little meaningful correlation between the assessments given in the case decisions and the scores assigned by the algorithm (and this comment is true regardless of the length of the marks) - although all pairs assigned a visual similarity score greater than 86, and an aural similarity score greater than 93, were manually deemed to be similar according to the same criterion.

Part of this lack of overall correlation is, however, probably due to the significant subjectivity (and apparent 'inconsistency') in the manual decisions. Table 1 shows the scores assigned by the algorithms in each of the cases (with the pairs ranked by overall similarity score), and the results to appear - by manual inspection - to be meaningful (noting that there are some shortcomings, such as relatively poor reflection of distinct vowel sounds).

Mark 1	Mark 2	Avg. mark length	Visual sim. score	Aural sim. score	Overall sim. score
consiglieri	consigliera	11.0	94	100	97
fashiongo	fashionego	9.5	97	96	96
demiegod	demigods	8.0	92	100	96
1link	link	4.5	91	100	96
intellicare	intelecare	10.5	90	100	95
lovello	lovelle	7.0	90	100	95
configon	configo	7.5	95	93	94
chooey	chooee	6.0	88	100	94
asos	asas	4.0	81	100	90
prinker	prink	6.0	89	92	90
nutravita	nootrovita	9.5	79	100	90
resolution	resolute	9.0	85	94	89
naturli'	natureal	8.0	83	96	89
gobox	g-box	5.0	84	95	89
prinz	prinse	5.5	81	95	88
realme	realmz	6.0	88	88	88
cintra	citra	5.5	93	82	88
testex	test-x	6.0	88	87	87
curve	crv	4.0	82	93	87
energeo	enerjo	6.5	84	90	87
pyra	prya	4.0	84	90	87
kramer	cramer	6.0	86	88	87
billionaire	zillionaire	11.0	92	81	86
vidas	vidya	5.0	85	88	86
yorxs	yorks	5.0	85	88	86
burgerme	burgerly	8.0	83	90	86
mbet	m-bets	5.0	85	88	86
pikdare	pi-kare	7.0	89	83	86
geneverse	genv3rse	8.5	85	86	85
retaron	retlron	7.0	90	81	85
goddess	godless	7.0	90	81	85
snuggledown	snugglemore	11.0	81	89	85
philips	philzops	7.5	86	83	85
noughty	naughtea	7.5	74	95	84
tygrys	tigris	6.0	74	95	84
george	georgine	7.0	91	78	84
mbfw	mvfw	4.0	80	88	84
hanson	hansol	6.0	88	79	84
zemo	zoomo	4.5	67	100	84
ellesse	elliss	6.5	83	84	83
scaffeze	scaffx	7.0	80	87	83
maplab	maplab.world	9.0	79	86	82
matters	m4tter	6.5	82	82	82
createme	create.	7.5	86	78	82
patter	yatter	6.0	86	78	82
treca	trea	4.5	92	71	82
foltene	foltex	6.5	84	79	81
lucite	luci	5.0	87	75	81
dcsl	dcs	3.5	90	71	81
fransa	fanza	5.5	79	82	80
kelio	kleeo	5.0	70	90	80
zara	zareus	5.0	71	88	79
very	veryco	5.0	87	71	79
saypha	shaype	6.0	74	84	79
carbon	mycarbon	7.0	89	68	78
z-biome	biome	6.0	87	68	77
chef	chefchy	5.5	82	71	77
atma	atmaspa	5.5	82	71	77
rabe	rase	4.0	81	71	76
azure	azurity	6.0	77	74	76
noughty	nouti	6.0	76	75	76
suntech	suntank	7.0	70	81	75
live	vive	4.0	79	71	75
repevax	epvax	6.0	87	63	75
sherco	charco	6.0	72	75	74
cazoo	carkoo	5.5	79	66	73
coversyl	covixyl-v	8.5	70	75	72
zara	zarzar	5.0	86	59	72
hyprr	hypernft	6.5	73	71	72
fido	fiio	4.0	81	63	72
pockit	mypocket	7.0	76	67	71
axis	traxis	5.0	84	59	71
ulma	luma	4.0	83	59	71
idee	idee-home	6.5	75	66	71
live	life's	5.0	70	71	71
salio	saliogen	6.5	85	55	70
e-bulli	bullit	6.5	81	59	70
waken	wakeful	6.0	77	59	68
airbnb	airbrick	7.0	70	65	68
resolva	consolva	7.5	69	64	66
glenfiddich	inverfiddich	11.5	74	59	66
hotpatch	patch	6.5	79	51	65
boss	bossvel	5.5	82	40	61
bloo	bluuwash	6.0	46	71	58
vfh	vfhonline	6.0	67	45	56
boss	kissboss	6.0	63	33	48
swift	microswift	7.5	55	34	44
rolex	dermarollex	8.0	49	34	41
sane	cbdsane	5.5	37	34	35
md	intimd	4.0	25	25	25

Table 1: Visual, aural and overall similarity scores (as assigned by the algorithm(s)) for the pairs of single-word marks

Analysis and testing - Case study 2: Trademark similarity searches

As a second validation, it is informative to compare the similarity score(s) as described above (hereafter described as the 'similarity score(s)') against those given as outputs from a trademark watching service. As an example, I consider a set of around 1,800 findings from watches being run for around 30 distinct marks. The trademark watching service reports on identified third-party trademark applications deemed in each case to be similar to the mark being watched, and returns a similarity measurement (hereafter described as the 'TM-watch similarity measurement') (expressed as a percentage) for the pair of marks, also generated via a proprietary algorithm. This algorithm takes into account a number of factors, including numbers of letters in common and a series of alphabetical, phonetic and grammatical rules^[7].

For simplicity:

I consider only trademark watches where the watched mark is a single-word word mark.
I focus only on watches monitoring all product and service classes, to ensure that the similarity score output by the service reflects only the similarity of the word marks, and not the degree of overlap of the classes.
I exclude any results where the identified trademark contains non-English characters (approx. 300 cases).
I convert all marks (watched and identified) to lower-case text.

In theory (if both the similarity score and the TM-watch similarity measurement algorithms work well), we would expect their values (for the same pairs of marks) to correlate well with each other. However (as for the case decisions considered in Case Study 1), there do seem to be some inconsistencies in the TM-watch similarity measurement algorithm. For example:

In some cases, identical matches, or cases where the identified mark is identical to the watched mark plus just an additional word, are given low TM-watch similarity measurement values (~5%).
In many cases, identified marks containing the watched mark are allocated very high values, regardless of the length of the identified mark and the degree to which the watched mark may feature just as an unrelated sub-string within it. This may be problematic / non-meaningful if (for example) the identified mark is a very long / multi-word string, and the corresponding watched mark is only (say) a two-character string contained within one of the words.

However, other apparent inconsistencies (such as instances of the same type of visual variation - e.g. the replacement of one character in a two-character mark - not always being allocated the same TM-watch similarity measurement value) might be intended behaviour (e.g. being intended to reflect the effect on pronunciation - i.e. aural similarity / difference). Admittedly, the full details of the algorithm used by the trademark watching service are not publicly available, and may include additional sophisticated elements which impact on the outputs.

Overall, it will be instructive to assess whether the similarity score is able to provide a better means of sorting (by potential relevance) the findings from the watch service, meaning that it could therefore be used to 'post-process' the watch-service data and build efficiencies into the review process.

The figures below show the relationship between the TM-watch similarity measurement value for each pair of marks and the overall similarity score (Figure 3) and the score reflecting just visual (i.e. spelling only) similarity, for the same pairs.

Figure 3: The relationship between TM-watch similarity measurement value and overall similarity score (i.e. considering both spelling and pronunciation) for the (~1,500) pairs of marks in the dataset

Figure 4: The relationship between TM-watch similarity measurement value and visual similarity score (i.e. considering spelling only) for the (~1,500) pairs of marks in the dataset

The data shows little meaningful correlation between the two metrics (with a weak positive correlation (corr. coefficient = +0.233) for the overall similarity score). In part, this seems to be because there is a lot of 'noise' / inconsistency in the TM-watch similarity measurement value (noting, for example, the anomalous cluster of pairs of marks assigned a very low value of ~5%).

The behaviour of the algorithms, and the relationship between them, may be easier to understand if we consider only the simpler cases where the identified mark (as well as the watched mark) is also just a single word (rather than a multi-word phrase). This relationship is shown in Figure 5, where the results are also categorised by the length (in characters) of the watched mark (to determine whether one or both algorithms perform better with marks of different lengths).

Figure 5: The relationship between TM-watch similarity measurement value and overall similarity score for the pairs of marks in which both the watched and the identified mark are single-word strings

Once again, there is no strong correlation (corr. coefficient = +0.231) and no obvious pattern in the data.

The lack of correlation actually seems to be largely due (as discussed) to the relatively poor performance of the TM-watch similarity measurement algorithm. The similarity score calculation seems to provide a much better measurement of (a subjective manual judgement of) the 'actual' similarity between the pairs of marks. This is illustrated by Tables 2 and 3, showing (in encoded form^[8]) (for the full dataset, including multi-word identified marks) the top- and bottom-ranked pairs of marks by overall similarity score - which seems to provide a (subjectively) reasonable assessment of similarity - and the fact that there is a wide range of TM-watch similarity measurement values present within the groups of most-similar and least-similar pairs. It does also seem, therefore, that the application of the similarity score could be an effective tool in post-processing the results from the trademark watch, and allowing for a quicker review process and identification of genuinely high-risk marks.

Mark 1 (watched)	Mark 2 (identified)	Visual sim. score	Aural sim. score	Overall sim. score	TM-watch sim. meas. value
jddi	jddi	100	100	100	100
dgwf	dgwf	100	100	100	100
rjoct	rjoct	100	100	100	100
fggwoc	fggwoc	100	100	100	5
fccjfg	fccjfg	100	100	100	100
qgffiwo	qgffiwo	100	100	100	5
dwbgcwi	dwbgcwi	100	100	100	100
bfigegiri	bfigegir	96	100	98	99
ajwbgeioj	ajwbg & eioj	91	100	95	5
dwbgcwi	dwbgcii	90	100	95	95
dgwf	dgw	90	100	95	95
dwbgcwi	dwjgcwi	89	100	95	99
dgjaaj	dgja	87	100	93	87
ft	ftj	86	100	93	96
ft	fti	86	100	93	96
qgffiwo	qgff fwof	83	100	91	57
dgwf	dgwc	82	100	91	95
dgwf	dgww	82	100	91	95
rjbjci	rjb	78	100	89	99
bfigegiri	bfigegirft	89	89	89	70
fccjfg	fccjii	77	100	88	82
dwbgcwi	dwpgcwi	89	88	88	99
dgwf	dgwci	76	100	88	88
dgwf	dgwcg	76	100	88	88
dwaf	dwwac	75	100	87	87
bfigegiri	jbfigegir	91	83	87	88
icge	ifgef	74	100	87	83
dwaf	dfafg	74	100	87	87
rjbjci	rfjjfi	74	100	87	84
qgffiwo	qgffiwo ffjc	83	90	86	98
dgwf	4dgwc	73	100	86	89
rjoct	rcojtc	72	100	86	87
ajwbgeioj	ajwbgefbw	77	95	86	90
dgwf	dgwc21	71	100	86	89
dgwf	dgwc 3	71	100	86	89

Table 2: Top-ranked pairs of marks (shown in encoded form) by overall similarity score

Mark 1 (watched)	Mark 2 (identified)	Visual sim. score	Aural sim. score	Overall sim. score	TM-watch sim. meas. value
rjbjci	igt tjdci tjggfqc	41	6	23	35
qp	op ocawwfwjid pwfgjww jdtce	29	18	23	86
fccjfg	tcaiw dfqf fcofg 77	40	7	23	35
rjbjci	ifiiwg tjdci	37	9	23	59
fccjfg	fdwfdbjfo fwjf	31	15	23	94
qp	qf ejdfdbjfg jdiwofdbc bifafdw	29	16	22	86
rjbjci	wdtco wgc tjdci	35	8	21	30
rjbjci	iwffco tjdci	33	9	21	61
rjbjci	iaojdq tjdci	33	9	21	61
qp	iwfo-qf	11	29	20	89
dgwf	ridtcodgwci	20	20	20	61
rjbjci	qgidfgtji qowai tjbfjff	31	9	20	11
gcggidcc	wfffd twf ichigj	28	11	20	11
qp	jd tj qi	10	29	20	86
dgwf	rjcdco dgww	20	18	19	89
qp	foiff qw	10	27	19	89
rjbjci	cfdofbc cwoiacfd tjdci	26	11	18	30
icge	fwigci wcf icdifij jcdjjffwfd & jcawfifd wcg ficgj jdtidcijf	30	7	18	32
rjbjci	biotjdf fe fjbgci & bctjbgci ffojipwcojf dfo fftojt-fe	31	5	18	30
qp	iifii fp	10	25	18	89
qp	gjec wi qi	9	25	17	86
ft	aoie c. & cggjc’i fjttgc ibgiig ie ibjcdbc ft fiwof aco fiacof	28	5	17	97
rjbjci	oiwg afgcwwcd wdt tjcgci fcgo	29	4	16	35
ft	gif bid fttw	15	18	16	36
jddi	wcoofj wcoofj igt tjdc	31	1	16	92
jddi	wcoofj wcoofj igt tjdc	31	1	16	92
dgwf	fcejbi bjww dgwci	15	15	15	35
ft	wgc iwaocfc tft	12	15	13	86
qp	ffjc wiwo qf gfaaw	5	21	13	86
rjbjci	gfofidw tjdci	16	8	12	61
ft	wgc iwaocfc tft.bif	10	12	11	86
rjbjci	dofdt tjdcq	12	9	11	61
ft	idc-bgjbj fa	7	9	8	29
qp	fa rrr.fwwi-afowi.qo	5	7	6	86
qp	gcw jw jd	3	5	4	86

Table 3: Bottom-ranked pairs of marks (shown in encoded form) by overall similarity score

Conclusions

The analysis does not show any strong correlation between the overall similarity score and the other two measures of similarity considered in this article - namely, (a) the subjective case decisions from recent UK trademark disputes, and (b) the more deterministic TM-watch similarity measurement algorithm. However, a significant factor in this lack of correlation seems to be the apparent inconsistencies (i.e. non-determinism) in these other types of assessment, rather than reflecting significant shortcomings in the similarity score algorithm in itself.

The similarity score does provide what appears (subjectively) to provide a meaningful measure of 'actual' similarity, meaning its application could lend it to uses in post-processing data drawn from a range of sources (such as trademark watching services) and potentially in helping to allow greater consistency and a more scalable, reproduceable, quantitative and objective basis for dispute resolution and case-law formulation.

There are, however, a number of enhancements still to be made to the algorithm. The current version takes no account of the meaning of terms (i.e. conceptual similarity) or of other factors such as the similarity of associated goods and services classes, or the level of distinctiveness of the marks (which is perhaps more relevant specifically to a determination of likelihood of confusion, rather than simply similarity).

Finally, it would certainly not be reasonable to suggest that this type of formulaic approach is preferable to a full manual assessment, taking into account the wide range of additional nuanced and subjective factors which are relevant in a dispute case. Rather, I am suggesting that these types of quantitative algorithm can be used as (consistent, reproduceable) 'tools' to be utilised as part of the overall similarity assessment frameworks relevant to the resolution of intellectual property disputes.

Part 2 - Subsequences and substrings

Introduction

Following on from the analysis presented in Part 1, it is useful to consider some additional characteristics of word marks which can further be used in a determination of similarity. In this follow-up, I again consider the dataset of recent UK trademark disputes (Case study 1 of Part 1), though now considering all 243 cases, rather than just the single-word marks^[9].

In Part 1, I noted that some of the case decisions appeared to have been influenced, at least in part, by a subjective assessment of the elements of the marks which were deemed to be the 'distinctive' parts (e.g. 'Dose' in Dose & Co. vs Dose Labs).

In order to try and build a formulation to try and address this point, it is useful to consider characteristics of the subsequences and substrings (i.e. groups of characters) contained within the marks.

First steps towards formulating a methodology

One characteristic often considered in analyses of these types is the concept of the longest common subsequence (hereafter referred to as the 'LCSSQ') - as also referenced in my original study on mark similarity - between a pair of strings. This is defined as the longest set of characters which appears in the same order (though not necessarily in consecutive positions) in both strings (i.e. generally (through not necessarily) non-contiguous characters). This parameter is also the basis of the Ratcliff-Obershelp similarity metric^[10,11], defined as twice the length of the LCSSQ divided by the sum of the lengths of the two strings (giving a value between 0 and 1) - essentially, the length of the LCSSQ expressed as a proportion of the average of the length of the two strings.

Whilst the LCSSQ is a useful characteristic - particularly in cases of alternative spellings, such as Sole Sister vs Soulsistar, where it is equal to 'solsistr' - it can lead to some misleading results. Consider, for example, the case of Dreams vs Dream Big Make Dua Move Mountains; the LCSSQ in this case is (the apparent full word) 'dreams', although the 's' identified in the latter case is actually the character at the end of 'mountains'. Accordingly, it can be instructive also to consider the longest common substring (hereafter referred to as the 'LCSST'), in which the characters must appear in both strings in the same order and in consecutive positions (i.e. contiguous characters) (which, in the case of the above marks is 'dream').

Having identified the LCSSQ and LCSST, useful insights can be obtained by determining the 'remainders' of each string, i.e. the parts which remain after the (first instance of) the common sub-elements are removed. The analysis of the full set of pairs of disputed marks is shown in Appendix B. Arguably a better metric than (just) Ratcliff-Obershelp similarity is to also calculate an equivalent score, using the LCSSTs (rather than the LCSSQs), and then calculate the average of these two scores (to produce a metric which is here termed the 'modified Ratcliff-Obershelp similarity score'); the data in Appendix B is sorted by this score.

Invariably, the data produced by this purely mathematical approach may require some 'cleaning up' prior to any further analysis, as it will generate some apparent anomalies. For example, in Holiday People vs The Holiday People, the LCSSQ is 'holiday people', which actually generates a remainder for the second mark of 'te h' - since the extraction algorithm removes the LCSSQ's initial 'h' from the word 'the' (the first place it appears in the overall string) rather than the word 'holiday' - and so the remainder should instead probably be 'corrected' to 'the'. Similarly, for PT Powerpod vs Powerpod, the remainder for the first mark is extracted as 't p', rather than the 'pt' it arguably 'should' be.

Taking the above caveat on board, the analysis does provide some useful insights. The most obvious (mathematically trivial) case is where the two marks are identical, in which case both the remainders are non-existent (blank / null / empty), and the modified Ratcliff-Obershelp similarity score is 1 (as for the top four examples in the Table in Appendix B). Cases where one remainder is blank imply that one mark is contained within the other.

Beyond this, the LCSSQ / LCSST analysis does provide a framework making it possible to conveniently review the elements which the marks have in common. For example, in the Dose & Co. vs Dose Labs case, the remainders after extracting the LCSST are, respectively 'and co.' and 'labs'. When assessing overall similarity, it might be reasonable to disregard remainders which are extremely non-distinctive (such as 'and co.'); arguably, however, 'labs' might be deemed to be a more distinctive / descriptive term, which might mean that the overall assessed level of similarity should be lower than if the remainders for both marks were less distinctive terms. Other non-distinctive terms in the dataset include 'the', 'my', and so on.

Attempting to quantify the degree of distinctiveness (or non-distinctiveness) of the terms in the mark remainders is a more difficult prospect. In the original study, the numbers of results returned by search engines were explored as a proxy for this parameter. However, in these types of analysis, this approach may not be very productive, as even more distinctive terms such as 'labs' would generate large numbers of results. Perhaps a better question is whether the remainder keywords overlap significantly with the goods and services of the other mark (e.g. if Dose & Co. offered laboratory services, the assessed degree of similarity (by virtue of the use of the term 'labs' in the other mark) should potentially be assessed as being much higher).

Similar comments might also apply to other pairs which differ only in a keyword relating to business area. Examples in the dataset include Stones vs Stone Brewing, Whitehorse Liquidity Partners vs Whitehorse, PXG vs PXG Pharma, and Unity vs Unity Real Estate Ltd. It might be reasonable to expect that similarity should be assessed to be lower if the remainders (i.e. the differences between the marks) are both distinctive and relate to different business areas or brand descriptors - such as Catapult Vision vs Catapult Consulting or Spear & Jackson Predator vs Predator Gutter Vacuum.

It is essential that consideration of goods and services - and the degree to which these are likely to be familiar to general consumers - should always be incorporated into any overall similarity assessment framework. For example, in Honda vs Honbike - objectively not highly similar as words - relevant points to consider might be the extent to which the Honda brand is known to be associated with motorbikes, and whether it is well-known enough that even an abbreviated variant ('Hon') would evoke a brand association in the mind of the average consumer.

In other cases, similarity between the word types in the remainders might imply an overall greater similarity between the marks - such as Lemon Perfect vs Peach Perfect, where the remainders after the removal of the LCSST are 'lemon' and 'peach' (both fruits), Glenfiddich vs Inverfiddich, where the remainders are 'Glen' and 'Inver' (both common terms in Scottish place-names), or Karmacoin vs Karmacash.

Another consideration is that if the remainders are very short and/or meaningless (say, single letters), this might also imply a greater degree of similarity between the marks. However, this assertion may also be dependent on their positions of the distinct elements within the strings - for example, 'Consiglieri' and 'Consigliera' might be deemed to be more similar to each other than are 'Lotus' and 'Motus' (both cases differing by a single character) - particularly when factors such as local-language significance (e.g. just a potential difference in gender) are taken into account.

Conclusions

The discussion presented in this article has stopped a long way short of attempting to define a full framework for mark comparison using analysis of LCSSQs and LCSSTs. Nevertheless, an analysis of pairs of marks (as presented in Appendix B) does provide a useful framework for conveniently reviewing the elements that pairs of marks have in common, and those which are distinct from each other - which is key to an overall assessment of similarity. In this analysis, it is useful to consider both LCSSQs and LCSSTs together, as marks which are distinct in differing ways may be more suitable for analysis using one characteristic than the other.

The modified Ratcliff-Obershelp similarity score - which quantifies the size of the common sub-elements as a proportion of the length of the original strings - is also a useful metric in its own right, which could easily be incorporated into an enhanced version of the similarity score presented in the previous studies.

Building on these ideas, it should be possible to build the beginnings of a frameworks of mark similarity comparison utilising sub-element analysis, to augment the ideas presented previously. Such a framework would likely need to incorporate a number of key ideas, such as analysis of the 'remainders' for each of the marks (i.e. the portions which are left when the common elements are removed) and how descriptive they are, how closely they overlap with the goods and services of the other mark, how thematically similar they are to each other, and the positions within the original marks at which they appear.

Part 3 - Phonemizing

Introduction

The initial study on mark similarity measurement utilised two distinct models to generate the phonetic representation of word marks^[12], from which the degree of (aural) similarity in the pronunciation of pairs of marks could be quantified (by measuring the similarity of the phonetic representations using the Python-based fuzz.ratio algorithm^[13]). The two models were Soundex, which represents each mark as a four-character string, and NYSIIS (New York State Identification and Intelligence System). However, both of these encodings have certain shortcomings, not least the fact that they poorly handle (or disregard entirely) the vowel sounds within the strings and - in the case of Soundex - do not encode anything beyond the fourth consonant.

In this article, I consider the application of a (Python-based) model (named 'Phonemizer') for encoding the marks in International Phonetic Alphabet^[14] (IPA) format, which better handles the full string in each case and - unlike some other text-to-IPA encoders - can handle arbitrary strings, rather than just dictionary terms.

Methodology

The IPA is a means of representing strings phonetically using (mainly) Latin or Greek script^[15], but also utilising other special characters, such as a colon-like character denoting that the preceding sound is long and, in some versions, high or low vertical lines denoting the primary and secondary stressed syllables^[16].

The Phonemizer package is a Python-based tool for converting text strings to IPA format using a back-end text-to-speech software application named espeak-ng^{[17,18,19,20,21]}.

In this article, I again consider the same pairs of marks, taken from previous dispute cases, as utilised in the initial study. The marks are converted to their IPA representations using the Phonemizer package, and the phonetic representations again compared using the fuzz.ratio algorithm.

Analysis

Table 4 shows the IPA representations of each of the marks, as given by Phonemizer, and the corresponding (phonetic) similarity scores for the pairs of marks (compared with the scores obtained using the Soundex and NYSIIS representations, from the initial study).

Mark 1	Mark 2	Mark 1 (IPA)	Mark 2 (IPA)	Sim. score: IPA	Sim. score: Soundex	Sim. score: NYSIIS
kresco	cresco	kɹɛskoʊ	kɹɛskoʊ	100	75	100
casoria	castoria	kæsoːɹiə	kæstoːɹiə	95	75	91
seiko	seycos	seɪkoʊ	seɪkoʊz	93	100	67
starbucks	charbucks	stɑːɹbʌks	tʃɑːɹbʌks	90	50	77
mahendra	mahindra	mæhɛndɹə	mæhɪndɹə	89	100	100
bacchus	cacchus	bækəs	kækəs	83	75	67
trucool	turcool	tɹuːkuːl	tɜːkuːl	82	100	83
lucozade	glucos-aid	luːkəzeɪd	ɡluːkoʊzeɪd	82	50	93
louis vuitton	chewy vuiton	luːi vjuːɪʔn̩	tʃuːi vjuːɪtən	76	50	71
mdh	mhs	ɛmdiːeɪtʃ	ɛmeɪtʃɛs	74	75	80
intelect	entelec	ɪntɛlᵻkt	ɛntɛlɛk	71	75	80
starbucks	sardarbuksh	stɑːɹbʌks	sɑːɹdɑːɹbʌkʃ	70	75	71
cana	canya	kɑːnə	kænjə	67	100	100
simoniz	permanize	sɪmənɪz	pɜːmənaɪz	67	50	62
zirco	cozirc	zɜːkoʊ	kɑːzɜːk	67	50	60
bisleri	bilseri	baɪslɜːɹi	bɪlsɚɹi	67	75	83
magnavox	multivox	mæɡnɐvɑːks	mʌltivɑːks	64	50	75
nike	nuke	naɪk	nuːk	60	100	100
lakme	likeme	lækmi	laɪkiːm	57	100	89
puma	coma	puːmə	koʊmə	50	75	67
hpnotiq	hopnotic	eɪtʃpiːnoʊɾɪk	həpnɑːɾɪk	50	100	80
mcdonalds	mcsweet	məkdɑːnəldz	məkswiːt	48	75	43
louboutin	lubov	laʊbaʊtɪn	luːbɑːv	33	50	67

Table 4: IPA representations (from Phonemizer) and similarity scores (using the fuzz.ratio algorithm) for the pairs of marks, compared with the scores using the Soundex and NYSIIS representations

Overall (subjectively!), the IPA-based similarity score seems to perform well in ranking the pairs of marks by aural similarity, and provides a more satisfactory analysis than either of the two other phonetic models explored previously (which is perhaps unsurprising, in view of the shortcomings discussed above).

There are also a number of other specific observations of note:

The algorithm correctly (in my opinion!) rates the 'kresco' and 'cresco' marks as phonetically identical, and with the same IPA representation.

The IPA representation provides a convenient way of comparing the degree of similarity (according to Phonemize) between sub-elements of the strings when the primary difference is disregarded; for example, with Lucozade vs Glucos-Aid, if the initial 'ɡ' is removed from the IPA representation of the latter, the remaining strings are luːkəzeɪd and luːkoʊzeɪd, i.e. differing only in the middle vowel sound.

In portions of the words which the algorithm deems 'unreadable', the phonetic representation conveys a series of letter names. For example, 'mdh' is expressed as 'em-dee-aitch', and 'hpnotiq' as 'aitch-pee-notic'. Whilst this may be the desired behaviour in some cases, it may not always be appropriate.

This particular implementation of IPA is built around American, rather than English, pronunciation - for example, syllables such as 'vox' and 'not' are encoded using a long 'ah' sound (ɑː), and 'nuke' is represented as 'nooke' rather than 'nyooke'. Again, this may not always be appropriate for marks targeting an English audience (although perhaps less of an issue if comparing like-with-like).

One option for handling the above issues wherever they arise is to manually modify the strings, to ensure that they are encoded 'correctly' (or simply to modify the IPA representation before calculating the similarity score) - though of course this removes the objectivity of the approach. Nevertheless, there may be cases where this is unavoidable, if it is relatively indisputable that the algorithm has got the encoding 'wrong' (based on the - admittedly subjective - intended pronunciation). Examples in the dataset include:

'hpnotic' – encoded as 'aitch-pee-notic', where it would be preferable to modify the mark or directly edit the IPA representation to ensure that it is represented as həpnɑːɾɪk (in which case the similarity score will be 100)

'likeme' - this has been encoded as it would be pronounced if intended to be a single readable word (laɪkiːm - 'lyekeem'), whereas it is presumably intended to be read as 'like me'. This can be addressed by re-writing the mark as 'like-me', in which case it is encoded as laɪkmiː, giving an increased similarity score (compared with 'lakme') of 71.

Conclusion

Converting marks to their full IPA representations, as a means of comparing aural (pronunciation) similarity, appears to provide improved performance than using the Soundex or NYSIIS algorithms described previously, and would probably be preferable for inclusion in an improved metric for quantifying the overall similarity of marks.

The Phonemizer package offers a convenient method for generating the required phonetic encoding, although there remain some potential issues to be addressed, such as the emphasis on American pronunciation, which may not be appropriate in all cases, and the handling of lower-readability strings, or those marks which (subjectively) appear intended to be read in particular ways. These issues can be addressed by employing manual edits, but this runs the risk of breaching the wholly objective nature of the approach.

Appendix A: Assessments of word mark* similarity in recent UK trademark dispute cases

*Neglecting certain variants which differ only in the case of the characters

s = similar
d = different
? = inconsistent - i.e. different decisions reached at different times

Ref #	Mark 1	Mark 2	Aural (pronun.)	Visual (spelling)	Conceptl. (m’ning)	Overall
1	GIZEBRA	DEBRA THE GIZEBRA				s
2	SUNTECH	SunTank	s	s	d	d
3	IBACCY	Biccy Baccy	d	d	d	d
4	JOLLY	JOLLY PECKISH	s	s	s	s
5	DREAM COACH / DREAM BIGGER / DREAMS	Dream Rite	s	s	s	s
6	DOSE & CO.	DOSE LABS	s	s	s	s
7	AMAZON	AMAZON FOOD TRADER LTD	s	s	s	s
8	SKULLCANDY / SKULL-IQ	SKULL GAMING	s	s	d	d
9	PRINKER	Prink	s	s		s
10	3MONKEYS	3 Monkeys Communications	s	s	s	s
11	BOSS	Bossvel	s	s	d	s
12	CAZOO	CARKOO	d	s		d
13	LOTUS	Motus Group UK (and variants)	d	d		d
14	CARTILS	CARTEL DESIGN	s	s	d	d
15	PRINZ	Prinse		s		s
16	SHERCO	CHARCO	s	s		s
17	WATERFORD / WATERFORD TEIREOIR	LADY LOUISA WATERFORD			d	d
18	LIVE	LIFE'S			d	d
19	FLOWERS	FLOWER CAFE / FLOWER DRINKS	s	s	?	?
20	STR8 GO FOR GREAT / STR8	ST8	s	d	d	d
21	myGeneCare / myGeneWisdom / myGeneDiary / myGenePredict / myGeneHelp	MYGENES	s	s	s	s
22	NRJ / ENERGY NRj	NRG	d	d	d	d
23	OYSTER	Oyster and Pop	s	s	s	s
24	SKY	Sky Force	s	s	s	s
25	THE GOOD SCHOOLS GUIDE	The Good School	s	s	s	s
26	NOUGHTY	NAUGHTEA	s	s	s	s
27	GLOW UP / Glow Up: Britain's Next Make-Up Star	glow up: britain's next make-up star	s	s	s	s
28	DEMIEGOD	DEMIGODS	s	s	s	s
29	Retaron	Retlron	s	s		s
30	THIS GIRL CAN	This Girl Came	s	s	d	d
31	GODDESS	GODLESS	s	s	d	s
32	PARIS-MATCH	PARIMATCH / PARiMATCH TECH / PARI MATCH	s	s	s	s
33	ANYTIME FITNESS / ANYTIME HEALTH	ANYTIME PRO	s	s	s	s
34	PIKDARE	PI-KARE	s	s		s
35	Kramer	CRAMER		s	s	s
36	FIDO	FIIO	s	s		s
37	EVOLUTIQ	ESSENTIAL EVOLUTION	d	d	d	d
38	TRECA	TREA	s	s		s
39	EASYJET / EASYGYM / EASYHOTEL / EASYBUS / EASYCAR / easyProperty / EASYCOFFEE	Easycosmetic	d	d	d	d
40	X12 / X14 / X16 / X15 / X13	vivo X15 / vivo X12 / vivo X14 / vivo X16 / vivo X13	s	s	d	d
41	Victoria / victoria Dear World	Victoro	s	s	s	s
42	VIZRT	Vizst TECHNOLOGY / Vizst	s	s	s	s
43	ASOS	ASAS	s	s		s
44	MAKEUP WARDROBE	MAKEUP WARDROBING	s	s	s	s
45	LUSH	Lush Lights	s	s	s	s
46	SIX DAYS	DAY6	s	s	s	s
47	GENEVERSE	GENV3RSE	s	s		s
48	DIAMOND MIST	VAPES BARS DIAMOND / DIAMOND BAR 600 / MAX DIAMOND / DIAMOND MAX / DIAMOND PRO	d	d	d	d
49	EATALIANO	EATalia / EAT-alia	s	s	s	s
50	VFH	VFHOnline	s	s	d	s
51	MBFW	MVFW	s	s		s
52	TYGRYS	TIGRIS	s	s		s
53	Meta Technology / META META / Meta	META PORTAL / META PLATFORMS / META / META QUEST / META HORIZON / META VIEW	s	s	s	s
54	Burgerme	BURGERLY	s	s	d	d
55	SiR / SIR	SIRO	s	s	d	d
56	OYSTER PERPETUAL / PERPETUAL	PERPÉTUEL / PERPETUEL	s	s	s	s
57	Spear & Jackson Predator	Predator Gutter Vacuum	s	s	s	s
58	ULMA	LUMA	s	s		s
59	AZURE	Azurity	s	s	s	s
60	1LINK	LINK	s	s	s	s
61	CATAPULT VISION / CATAPULT ONE	Catapult Consulting	s	s	s	s
62	6X / 7X / 9X / 8X / 5X	vivo X6 / vivo X5 / vivo X7 / vivo X9 / vivo X8	d	d		d
63	ASPREY / Asprey LONDON	DAVE ASPREY	s	s	d	d
64	ENERGEO	ENERJO	s	s	s	s
65	MUTANT	MEGA MUTANT	s	s	s	s
66	PHILIPS	PhilzOps	d	s		d
67	Last Shelter:Survival	Doomsday: Last Survivors	d	d	s	d
68	ACTIVIST	Activist Ingredients / Davines Activist Ingredients	?	?	?	?
69	FiTTiPALDi / FITTIPALDI	EMERSON FITTIPALDI / eFittipaldi / FITTIPALDI AUTOMOBILI	s	s	s	s
70	SWIFT	MicroSwift	s	s	s	s
71	ArmaLight / ArmaGel	ARMATHERM	s	s	d	s
72	FlexoLid	kp FlexiLid	s	s		s
73	VERY	VERYCO	s	s	s	s
74	ZARA	ZARZAR	s	d		d
75	VAULT IP / VAULT INTELLECTUAL PROPERTY	BRANDVAULT	d	d	d	d
76	Alpha Boxing	ALPHA FORCE	s	s	s	s
77	LEMON PERFECT	PEACH PERFECT	s	s	s	s
78	life	ChopLife / Chop Life	s	s	s	s
79	RYZEN / AMD RYZEN	RYZEUP / RyzeUp	s	s	d	s
80	IPING 2.0 / PING	pingNpay	d	s	s	s
81	NUTRAVITA	Nootrovita	s	s	s	s
82	PUSHER	Pushers Only	s	s	s	s
83	Zemo	ZOOMO	s	d	d	d
84	WEAR THE CHANGE	WEAR THE FUTURE	s	s	s	s
85	IntelliCare	Intelecare	s	s	s	s
86	PT Powerpod / PT Powerpods	POWERPOD	s	s		s
87	AGRHO S-ROX / AGRHO	agro S	s	s		s
88	SIZZLING FORTUNES / SIZZLING COIN / SIZZLING HOT	SIZZLING BELLS / SIZZLING MOON / SIZZLING REELS / SIZZLING KINGDOM	s	s	s	s
89	FOLTENE	FOLTEX	s	s		s
90	du Feu	DU FEU DESIGN	s	s		s
91	VIDAS	VIDYA	s	s		s
92	VOLVO	VOLTA TRUCKS / VOLTA ZERO / VOLTA / VTRUCKS / V TRUCKS / V-TRUCKS	d	?	d	d
93	CROC odor WC / CROC odor / Croc'Odor / Croc Odor the kitchen expert	cocod'or	s	s	d	d
94	RABE	RASE	s	s		s
95	CINTRA	CITRA	s	s	d	d
96	MATTERS	M4TTER	s	s	d	s
97	MFLOR	HFLOR / H FLOR	s	s	d	d
98	MBET	M-Bets	s	s	s	s
99	next	NXTWEAR S	s	s	s	s
100	STONES	STONE BREWING	s	s	s	s
101	CHEF	CHEFCHY	s	s	s	s
102	Bones	Bones Of Barbados	s	s	d	d
103	Satisfyer	SIMPLY SATISFY	d	d	s	d
104	FRANSA	FANZA	s	s	s	s
105	GAP	GAL London	s	d	d	d
106	CARBON	MyCarbon	s	s	s	s
107	COMPAL	COPALLI / COPAL TREE	?	?	d	?
108	DENNIS / DENNIS AND GNASHER	Dennis G	s	s	s	s
109	SAVANT	SAVANT POWER	s	s	s	s
110	HYPRR	HYPERNFT	s	s	s	s
111	POCKIT	MyPocket	s	s	s	s
112	SHEPHERD	WOLF & SHEPHERD	s	s	s	s
113	HONWAVE / HONDA / Honda e	Honbike	s	s	d	s
114	skin² / NITRILE SKIN²	SKINS	s	s	s	s
115	Hanson	HANSOL	s	s		s
116	CULT BEAUTY / CULT CONCIERGE	PERFUME CULT	s	s	s	s
117	BOSS / HUGO BOSS / BOSS HUGO	BOSSEUR	s	d		d
118	Kelio	KLEEO	s	s		s
119	e-BULLI	BULLIT	s	s		s
120	realme	REALMZ	s	s	d	d
121	MUSTANG	MUSTANG / FORD MUSTANG / MUSTANG MACH-E	s	s		s
122	EUREKA!	EUREKA EDUCATION	s	s	s	s
123	Rebelle Copenhagen	reBELLE BEAUTY	s	s	s	s
124	PI DATABOOK / PI	PIANYWHERE	d	d	s	d
125	CONSIGLIERI	CONSIGLIERA	s	s		s
126	Saypha	SHAYPE	d	s	d	d
127	FOLIUM SCIENCE	FOLIUM	s	s	d	s
128	MD	IntiMD	s	s	s	s
129	Higicol – AMMA / amma / AMMA COLORS	Amma Wellness	s	s		s
130	LIP INJECTION	LIPJECTION GLOSS	s	s		s
131	RESOLVA	CONSOLVA	s	s		s
132	CHOOEY	CHOOEE		s		s
133	Bloo	bluuwash	s	s	s	s
134	COVERSYL	COVIXYL-V	d	s		d
135	CLEAN V / CLEAN W / CLEANCO / CLEAN G / CLEAN R / CLEAN T	Drink Clean.	d	d	s	d
136	ZARA	ZAREUS	d	s	d	d
137	PXG Pharma	PXG	s	s	s	s
138	Click-EAT / CLICK EAT	SUBWAY CLICK & EAT	s	d	s	s
139	REPEVAX	EPVAX	s	s		s
140	CURVE	CRV	d	d	d	d
141	GEORGE	GEORGINE	s	s	s	s
142	One4All Favourites / One4all	OneFor / ONE FOR		s	s	s
143	PYRA	PRYA	s	s		s
144	HALLOUMI	GRILLOUMAKI / GRILLOUMI	s	s	d	d
145	CANE AND GRAIN	CANE & GRAIN INTERNATIONAL	s	s		s
146	BAD BOY	BBCC / BAD BOY CHILLER CREW	s	s	s	s
147	XACTIMATE / XACTANALYSIS / XACTWARE	BUILDXACT	d	d	d	d
148	SALIO	SALIOGEN	s	s	d	s
149	HotPatch	Patch	s	s	s	s
150	JUST	Just The Ticket	s	s	d	d
151	THINKSMART / THINKPAD / THINKBOOK / THINKSHIELD	XHINKCAR	d	d	d	d
152	AESCULAP	AESKUCARE / AESKUCARE Allergy / AESKUCARE Food Intolerance	d	d		d
153	RESOLUTION	RESOLUTE	s	s	s	s
154	PUMA	Huma / Huma London			d	d
155	YOGA	Yoga Man		d		d
156	LIVE	VIVE	d	s	s	d
157	PLANET BOTTLE	One Planet	s	s	d	d
158	MySugardaddy	Sugar Daddy / Sugardaddy	s	s	s	s
159	PATTER	Yatter		d		d
160	HOLIDAY PEOPLE	The Holiday People	s	s	s	s
161	PRADA Invites / RE-PRADA / PRADA TIMECAPSULE SERIAL N	PADRA	s	s		s
162	THE IVY LEAGUE	The New Ivy League	s	s	s	s
163	UNITY TRUST BANK / UNITY	UNITY REAL ESTATE LTD	s	s	s	s
164	DREAMS / DREAM BIGGER Dreams / DREAM BIGGER / DREAM COACH	Dream Big Make Dua Move Mountains	s	s	s	s
165	MOON PRINCESS	Time Princess		d		d
166	FASHIONGO	FASHIONEGO	s	s	s	s
167	THE SECRET GARDEN PARTY	The Secret Garden Glamping / The Secret Garden				s
168	ELLESSE	ELLISS	s	d	d	d
169	SOLE TRADER / SOLE / SOLE SOLE / SOLE SISTER	SoulSistar	s	s	s	s
170	SWOOP / FLY SWOOP	SWOOP TAXIS / swooptaxis	s	s	s	s
171	WASHTOWER	Washing Tower / Washer Tower	s	s	s	s
172	POTETTE PLUS	Pote Plus	s	s	s	s
173	IDÉE	idee-home	s	s	d	s
174	JOY	BODY IN JOY	s	s	s	s
175	ROLEX	DERMAROLLEX	d	d	d	d
176	BARRIER	Barrier Coat	?	?	?	?
177	FLEX	FLEXX BY BBOXX	s	s	s	s
178	LOVELLO	LOVELLE	s	s		s
179	easyLand / EASYNETWORKS / EASYHUB	EasyMap	s	d	d	d
180	VARSITY / VARSITY SPIRIT FASHIONS	VARSITY HEADWEAR	s	s		s
181	WAKEN	Wakeful	s	s	s	s
182	AK DAMM	BLACK ADAM	d	d	d	d
183	YALU	[HYALU] BIOTIC				d
184	Z-BIOME	BIOME	s	s		s
185	Sleep Doctor / Sleep Dr	Dr.sleep	s	s	s	s
186	sane	CBDSANE	s	s	s	s
187	Closet. LONDON / CLOSET	THE LUXURY CLOSET	s	s	s	s
188	Lucky Strike	LUCKY BAR	s	s	s	s
189	ACTIVON MANUKA / ACTIVON	ARTIVION	s	s	d	s
190	YORXS	Yorks		s		s
191	AGROS	agro S	s	s	s	s
192	IVALUA / IVALUA VALUE BEYOND SAVINGS / IVALUA BUYER	iValue Solutions	s	s	s	s
193	Maplab	maplab.world	s	s	s	s
194	EVERY BODY	Everybodies	s	s	s	s
195	MATCH.COM / match / MG MatchGroup	MatchMate	s	s	s	s
196	NOUGHTY	NOUTI	s	d		d
197	Red Bull	ELFBULL	s	s	s	s
198	SMART	SMARTCARE / SMART CARE / SMARTBUSINESS / SMART BUSINESS / SMARTCLASS / SMART CLASS / SMARTSPACE	s	s	s	s
199	SNAP	Snap Nurse / SnapNurse	s	s	s	s
200	IV BOOST	IV PATCH	s	s	s	s
201	UNITY	INVIVO X UNITY	s	s	s	s
202	Eye of Horus	EYE OF ATUM	s	s	d	d
203	THUNDER PRODUCTIONS	SUN AND THUNDER	d	d	d	d
204	VISTAINTRA / VISTAVOX / VISTAPANO	VISTO	s	s	d	d
205	X WAY / XWAY	Exway	s	s		s
206	XCODE / CORE ANIMATION / CORE HAPTICS	XCORE	s	s	d	s
207	PETROL	PETROL REVOLT	s	s	s	s
208	SCAFFEZE	SCAFFX	s	s		s
209	BUBBLE / BUBL	BUBBLE ROCKET	s	s	s	s
210	THE LEONARDO COLLECTION	LEONARDO	s	s	s	s
211	LUCITE	Luci	s	s	d	d
212	Sense / Essence	SENSE	s	s	d	d
213	TESTEX	TEST-X		s	d	s
214	Lister's Brewery / Lister's	Listers		s		s
215	Nisha	Misha Cosmetics	s	s	d	s
216	AIRBNB	AIRBRICK	s	s		s
217	MOAT	Moat Systems / MOATSYSTEMS	s	s	s	s
218	ME YOU	YOU ME	s	s		s
219	KARMACOIN	KARMACASH / KARMAGIVES / KARMAPAY / KARMASHOPPING	s	s	s	s
220	GOBOX	G-Box	s	s	s	s
221	ATMA	AtmaSPA	s	s	s	s
222	BOSS	Kissboss	s	s	s	s
223	CHAINLINK / CHAINLINK LABS	LINK	s	s	d	d
224	JESSICA LONDON	JESSICA JOY LONDON	s	s	s	s
225	SNUGGLEDOWN	Snugglemore	s	s	s	s
226	AXIS	TRAXIS	d	d	d	d
227	Gadget Centre	Gadget Centre UK Ltd	s	s	s	s
228	WHITEHORSE LIQUIDITY PARTNERS	WHITEHORSE / H.I.G. WHITEHORSE	s	s	s	s
229	BILLIONAIRE	ZILLIONAIRE	s	s	s	s
230	GOLFHER	The Golphers	s	s	s	s
231	PRED	PRD TECHNOLOGY	d	d		d
232	GHOST	GHOSTDANCER CBD	s	d	s	d
233	NATURLI'	NATUREAL	s	s	s	s
234	GLENFIDDICH	Inverfiddich	s	s	s	s
235	CONFIGON	CONFIGO	s	s		s
236	PURPLE COMPUTING / PURPLE	PURPLECUBE	d	d	d	d
237	CREATEME	Create.	s	s	s	s
238	sky / SKY X	YSKY	d	d	s	s
239	RING	Home Ring	s	s	s	s
240	CHLOE / Chloé	chloédigital	s	s	s	s
241	YOU BEAUTY DISCOVERY / YOU BEAUTY / YOU	YOU·OLOGY	d	d	d	d
242	SIO / SIO BEAUTY	RIO COSMETICS	d	d	d	d
243	DCSL	DCS	s	s		s

Appendix B: LCSSQs and LCSSTs, and 'remainders', for the 243 pairs of marks

Mark 1	Mark 2	LCSSQ	Rem. 1	Rem. 2	LCSST	Rem. 1	Rem. 2	Mod. Rat.-Ob. similarity
glow up: britain's next make-up star	glow up: britain's next make-up star	glow up: britain's next make-up star			glow up: britain's next make-up star			1.000
meta	meta	meta			meta			1.000
mustang	mustang	mustang			mustang			1.000
sense	sense	sense			sense			1.000
fittipaldi	efittipaldi	fittipaldi		e	fittipaldi		e	0.957
configon	configo	configo	n		configo	n		0.941
consiglieri	consigliera	consiglier	i	a	consiglier	i	a	0.917
billionaire	zillionaire	illionaire	b	z	illionaire	b	z	0.917
mysugardaddy	sugardaddy	sugardaddy	my		sugardaddy	my		0.917
1link	link	link	1		link	1		0.909
xway	exway	xway		e	xway		e	0.909
this girl can	this girl came	this girl ca	n	me	this girl ca	n	me	0.897
dcsl	dcs	dcs	l		dcs	l		0.889
eataliano	eatalia	eatalia	no		eatalia	no		0.889
sir	siro	sir		o	sir		o	0.889
sky	ysky	sky		y	sky		y	0.889
lister's	listers	listers	'		lister	's	s	0.882
makeup wardrobe	makeup wardrobing	makeup wardrob	e	ing	makeup wardrob	e	ing	0.882
holiday people	the holiday people	holiday people		te h	holiday people		the	0.882
lovello	lovelle	lovell	o	e	lovell	o	e	0.875
carbon	mycarbon	carbon		my	carbon		my	0.875
dennis	dennis g	dennis		g	dennis		g	0.875
fashiongo	fashionego	fashiongo		e	fashion	go	ego	0.857
chooey	chooee	chooe	y	e	chooe	y	e	0.857
prinker	prink	prink	er		prink	er		0.857
realme	realmz	realm	e	z	realm	e	z	0.857
kramer	cramer	ramer	k	c	ramer	k	c	0.857
hanson	hansol	hanso	n	l	hanso	n	l	0.857
patter	yatter	atter	p	y	atter	p	y	0.857
z-biome	biome	biome	z-		biome	z-		0.857
pt powerpod	powerpod	powerpod	t p		powerpod	pt		0.857
the secret garden party	the secret garden	the secret garden	party		the secret garden	party		0.857
perpetual	perpetuel	perpetul	a	e	perpetu	al	el	0.850
agros	agro s	agros			agro	s	s	0.846
lucite	luci	luci	te		luci	te		0.833
very	veryco	very		co	very		co	0.833
axis	traxis	axis		tr	axis		tr	0.833
lotus	motus	otus	l	m	otus	l	m	0.833
mflor	hflor	flor	m	h	flor	m	h	0.833
skin²	skins	skin	²	s	skin	²	s	0.833
createme	create.	create	me	.	create	me	.	0.824
victoria	victoro	victor	ia	o	victor	ia	o	0.824
the good schools guide	the good school	the good school	s guide		the good school	s guide		0.821
treca	trea	trea	c		tre	ca	a	0.818
george	georgine	george		in	georg	e	ine	0.813
resolution	resolute	resolut	ion	e	resolut	ion	e	0.800
foltene	foltex	folte	ne	x	folte	ne	x	0.800
live	vive	ive	l	v	ive	l	v	0.800
salio	saliogen	salio		gen	salio		gen	0.800
e-bulli	bullit	bulli	e-	t	bulli	e-	t	0.800
hotpatch	patch	patch	hot		patch	hot		0.800
diamond mist	diamond max	diamond m	ist	ax	diamond m	ist	ax	0.800
puma	huma	uma	p	h	uma	p	h	0.800
gadget centre	gadget centre uk ltd	gadget centre		uk ltd	gadget centre		uk ltd	0.800
the ivy league	the new ivy league	the ivy league		new	ivy league	the	the new	0.794
testex	test-x	testx	e	-	test	ex	-x	0.786
potette plus	pote plus	pote plus	tte		te plus	potet	po	0.783
burgerme	burgerly	burger	me	ly	burger	me	ly	0.778
purple	purplecube	purple		cube	purple		cube	0.778
str8	st8	st8	r		st	r8	8	0.778
cintra	citra	citra	n		tra	cin	ci	0.769
prinz	prinse	prin	z	se	prin	z	se	0.769
chef	chefchy	chef		chy	chef		chy	0.769
atma	atmaspa	atma		spa	atma		spa	0.769
boss	bossvel	boss		vel	boss		vel	0.769
sane	cbdsane	sane		cbd	sane		cbd	0.769
ryzen	ryzeup	ryze	n	up	ryze	n	up	0.769
boss	bosseur	boss		eur	boss		eur	0.769
barrier	barrier coat	barrier		coat	barrier		coat	0.762
gobox	g-box	gbox	o	-	box	go	g-	0.750
vidas	vidya	vida	s	y	vid	as	ya	0.750
yorxs	yorks	yors	x	k	yor	xs	ks	0.750
mbet	m-bets	mbet		-s	bet	m	m-s	0.750
zara	zarzar	zara		zr	zar	a	zar	0.750
vizrt	vizst	vizt	r	s	viz	rt	st	0.750
xcode	xcore	xcoe	d	r	xco	de	re	0.750
moon princess	time princess	m princess	oon	tie	princess	moon	time	0.750
scaffeze	scaffx	scaff	eze	x	scaff	eze	x	0.750
nrj	nrg	nr	j	g	nr	j	g	0.750
match	matchmate	match		mate	match		mate	0.750
mygenecare	mygenes	mygene	care	s	mygene	care	s	0.737
asprey	dave asprey	asprey		dve a	asprey		dave	0.737
mutant	mega mutant	mutant		ega m	mutant		mega	0.737
halloumi	grilloumi	lloumi	ha	gri	lloumi	ha	gri	0.737
energeo	enerjo	enero	ge	j	ener	geo	jo	0.733
matters	m4tter	mtter	as	4	tter	mas	m4	0.733
demiegod	demigods	demigod	e	s	demi	egod	gods	0.722
naturli'	natureal	naturl	i'	ea	natur	li'	eal	0.722
azure	azurity	azur	e	ity	azur	e	ity	0.714
waken	wakeful	wake	n	ful	wake	n	ful	0.714
boss	kissboss	boss		kiss	boss		kiss	0.714
ping	pingnpay	ping		npay	ping		npay	0.714
yoga	yoga man	yoga		man	yoga		man	0.714
repevax	epvax	epvax	re		vax	repe	ep	0.714
snuggledown	snugglemore	snuggleo	dwn	mre	snuggle	down	more	0.708
resolva	consolva	solva	re	con	solva	re	con	0.706
swift	microswift	swift		micro	swift		micro	0.706
smart	smart care	smart		care	smart		care	0.706
jessica london	jessica joy london	jessica london		joy	jessica	london	joy london	0.706
philips	philzops	philps	i	zo	phil	ips	zops	0.706
asos	asas	ass	o	a	as	os	as	0.700
curve	crv	crv	ue		rv	cue	c	0.700
mbfw	mvfw	mfw	b	v	fw	mb	mv	0.700
rabe	rase	rae	b	s	ra	be	se	0.700
fido	fiio	fio	d	i	fi	do	io	0.700
ulma	luma	uma	l	l	ma	ul	lu	0.700
maplab	maplab.world	maplab		.world	maplab		.world	0.700
flowers	flower cafe	flower	s	cafe	flower	s	cafe	0.700
pusher	pushers only	pusher		s only	pusher		s only	0.700
savant	savant power	savant		power	savant		power	0.700
karmacoin	karmacash	karmac	oin	ash	karmac	oin	ash	0.700
intellicare	intelecare	intelcare	li	e	intel	licare	ecare	0.696
paris-match	pari match	parimatch	s-		match	paris-	pari	0.696
washtower	washer tower	washtower		er	tower	wash	washer	0.696
agrho	agro s	agro	h	s	agr	ho	o s	0.692
pikdare	pi-kare	pikare	d	-	are	pikd	pi-k	0.688
retaron	retlron	retron	a	l	ret	aron	lron	0.688
goddess	godless	godess	d	l	god	dess	less	0.688
pockit	mypocket	pockt	i	mye	pock	it	myet	0.688
ibaccy	biccy baccy	ibaccy		bccy	baccy	i	biccy	0.684
cane and grain	cane and grain international	cane and grain		international	cane and grain		international	0.682
glenfiddich	inverfiddich	efiddich	gln	invr	fiddich	glen	inver	0.680
eye of horus	eye of atum	eye of u	hors	atm	eye of	horus	atum	0.680
lemon perfect	peach perfect	e perfect	lmon	pach	perfect	lemon	peach	0.679
ellesse	elliss	ellss	ee	i	ell	esse	iss	0.667
compal	copalli	copal	m	li	pal	com	coli	0.667
sizzling fortunes	sizzling bells	sizzling es	fortun	bll	sizzling	fortunes	bells	0.667
shepherd	wolf and shepherd	shepherd		wolf and	shepherd		wolf and	0.667
zara	zareus	zar	a	eus	zar	a	eus	0.667
dreams	dream rite	dream	s	rite	dream	s	rite	0.667
life	chop life	life		chop	life		chop	0.667
du feu	du feu design	du feu		design	du feu		design	0.667
volvo	volta	vol	vo	ta	vol	vo	ta	0.667
swoop	swoop taxis	swoop		taxis	swoop		taxis	0.667
sleep dr	dr.sleep	sleep	dr	dr.	sleep	dr	dr.	0.667
petrol	petrol revolt	petrol		revolt	petrol		revolt	0.667
bubble	bubble rocket	bubble		rocket	bubble		rocket	0.667
chainlink	link	link	chain		link	chain		0.667
ring	home ring	ring		home	ring		home	0.667
idee	idee-home	idee		-home	idee		-home	0.667
wear the change	wear the future	wear the e	chang	futur	wear the	change	future	0.656
every body	everybodies	everybod	y	ies	every	body	bodies	0.652
lucky strike	lucky bar	lucky r	stike	ba	lucky	strike	bar	0.652
noughty	naughtea	nught	oy	aea	ught	noy	naea	0.647
easyland	easymap	easya	lnd	mp	easy	land	map	0.647
red bull	elfbull	ebull	rd	lf	bull	red	elf	0.647
activon	artivion	ativon	c	ri	tiv	acon	arion	0.647
anytime fitness	anytime pro	anytime	fitness	pro	anytime	fitness	pro	0.643
saypha	shaype	sayp	ha	he	ayp	sha	she	0.643
noughty	nouti	nout	ghy	i	nou	ghty	ti	0.643
sherco	charco	hrco	se	ca	rco	she	cha	0.643
satisfyer	simply satisfy	satisfy	er	imply s	satisfy	er	simply	0.640
varsity	varsity headwear	varsity		headwear	varsity		headwear	0.640
catapult vision	catapult consulting	catapult sin	vio	conultg	catapult	vision	consulting	0.639
zemo	zoomo	zmo	e	oo	mo	ze	zoo	0.636
oyster	oyster and pop	oyster		and pop	oyster		and pop	0.636
folium science	folium	folium	science		folium	science		0.636
geneverse	genv3rse	genvrse	ee	3	rse	geneve	genv3	0.632
chloe	chloedigital	chloe		digital	chloe		digital	0.632
suntech	suntank	sunt	ech	ank	sunt	ech	ank	0.625
airbnb	airbrick	airb	nb	rick	airb	nb	rick	0.625
waterford	lady louisa waterford	waterford		lady louisa	waterford		lady louisa	0.625
snap	snap nurse	snap		nurse	snap		nurse	0.625
nutravita	nootrovita	ntrvita	ua	ooo	vita	nutra	nootro	0.619
flexolid	kp flexilid	flexlid	o	kp i	flex	olid	kp ilid	0.619
fransa	fanza	fana	rs	z	an	frsa	fza	0.615
cazoo	carkoo	caoo	z	rk	ca	zoo	rkoo	0.615
gizebra	debra the gizebra	gizebra		debra the	gizebra		debra the	0.615
x15	vivo x15	x15		vivo	x15		vivo	0.615
lip injection	lipjection gloss	lipjection	in	gloss	jection	lip in	lip gloss	0.613
sole sister	soulsistar	solsistr	e e	ua	sist	sole er	soular	0.609
pyra	prya	pya	r	r	a	pyr	pry	0.600
alpha boxing	alpha force	alpha o	bxing	frce	alpha	boxing	force	0.600
thinksmart	xhinkcar	hinkar	tsmt	xc	hink	tsmart	xcar	0.600
hyprr	hypernft	hypr	r	enft	hyp	rr	ernft	0.600
md	intimd	md		inti	md		inti	0.600
jolly	jolly peckish	jolly		peckish	jolly		peckish	0.600
activist	activist ingredients	activist		ingredients	activist		ingredients	0.600
vault ip	brandvault	vault	ip	brand	vault	ip	brand	0.600
rebelle copenhagen	rebelle beauty	rebelle ea	copnhgen	buty	rebelle	copenhagen	beauty	0.588
lush	lush lights	lush		lights	lush		lights	0.588
vistaintra	visto	vist	aintra	o	vist	aintra	o	0.588
live	life's	lie	v	f's	li	ve	fe's	0.583
skullcandy	skull gaming	skullan	cdy	gmig	skull	candy	gaming	0.583
croc'odor	cocod'or	cocodor	r'	'	od	croc'or	coc'or	0.579
ak damm	black adam	ak dam	m	blca	dam	ak m	black a	0.579
tygrys	tigris	tgrs	yy	ii	gr	tyys	tiis	0.571
easyjet	easycosmetic	easyet	j	cosmic	easy	jet	cosmetic	0.571
vfh	vfhonline	vfh		online	vfh		online	0.571
sky	sky force	sky		force	sky		force	0.571
six days	day6	day	six s	6	day	six s	6	0.571
stones	stone brewing	stone	s	brewing	stone	s	brewing	0.571
honda	honbike	hon	da	bike	hon	da	bike	0.571
clean v	drink clean.	clean	v	drink .	clean	v	drink .	0.571
unity	invivo x unity	unity		invivo x	unity		invivo x	0.571
me you	you me	you	me	me	you	me	me	0.571
you	you·ology	you		·ology	you		·ology	0.571
dose and co.	dose labs	dose a	nd co.	lbs	dose	and co.	labs	0.565
eureka!	eureka education	eureka	!	education	eureka	!	education	0.560
planet bottle	one planet	planet	bottle	one	planet	bottle	one	0.560
closet	the luxury closet	closet		the luxury	closet		the luxury	0.560
rolex	dermarollex	rolex		demarl	lex	ro	dermarol	0.556
moat	moat systems	moat		systems	moat		systems	0.556
evolutiq	essential evolution	evoluti	q	ssential eon	evoluti	q	essential on	0.552
bad boy	bad boy chiller crew	bad boy		chiller crew	bad boy		chiller crew	0.552
armalight	armatherm	armah	ligt	term	arma	light	therm	0.550
click eat	subway click and eat	click eat		subway and	click	eat	subway and eat	0.548
cartils	cartel design	cartls	i	e deign	cart	ils	el design	0.545
the leonardo collection	leonardo	leonardo	the collection		leonardo	the collection		0.545
ghost	ghostdancer cbd	ghost		dancer cbd	ghost		dancer cbd	0.545
whitehorse liquidity partners	whitehorse	whitehorse	liquidity partners		whitehorse	liquidity partners		0.537
re-prada	padra	pada	re-r	r	ra	re-pda	pad	0.533
pxg pharma	pxg	pxg	pharma		pxg	pharma		0.533
one4all	onefor	one	4all	for	one	4all	for	0.533
coversyl	covixyl-v	covyl	ers	ix-v	cov	ersyl	ixyl-v	0.526
aesculap	aeskucare	aesca	ulp	kure	aes	culap	kucare	0.526
amma	amma wellness	amma		wellness	amma		wellness	0.526
golfher	the golphers	golher	f	the ps	her	golf	the golps	0.524
kelio	kleeo	keo	li	le	k	elio	leeo	0.500
iv boost	iv patch	iv t	boos	pach	iv	boost	patch	0.500
3monkeys	3 monkeys communications	3monkeys		communications	monkeys	3	3 communications	0.500
bones	bones of barbados	bones		of barbados	bones		of barbados	0.500
xactimate	buildxact	xact	imate	build	xact	imate	build	0.500
joy	body in joy	joy		body in	joy		body in	0.500
flex	flexx by bboxx	flex		x by bboxx	flex		x by bboxx	0.500
yalu	[hyalu] biotic	yalu		[h] biotic	yalu		[h] biotic	0.500
ivalua	ivalue solutions	ivalu	a	e solutions	ivalu	a	e solutions	0.500
just	just the ticket	just		the ticket	just		the ticket	0.476
next	nxtwear s	nxt	e	wear s	xt	ne	nwear s	0.467
amazon	amazon food trader ltd	amazon		food trader ltd	amazon		food trader ltd	0.467
nisha	misha cosmetics	isha	n	m cosmetics	isha	n	m cosmetics	0.455
thunder productions	sun and thunder	thunder	productions	sun and	thunder	productions	sun and	0.444
bloo	bluuwash	bl	oo	uuwash	bl	oo	uuwash	0.429
pi	pianywhere	pi		anywhere	pi		anywhere	0.429
unity	unity real estate ltd	unity		real estate ltd	unity		real estate ltd	0.429
last shelter:survival	doomsday: last survivors	last surviv	helter:sal	doomsday: ors	last s	helter:survival	doomsday: urvivors	0.404
gap	gal london	ga	p	l london	ga	p	l london	0.400
cult beauty	perfume cult	cult	beauty	perfume	cult	beauty	perfume	0.400
spear and jackson predator	predator gutter vacuum	pear ac	sandjkson predator	rdtoguttervuum	predator	spear and jackson	gutter vacuum	0.360
pred	prd technology	pre	d	d tchnology	pr	ed	d technology	0.350
sio	rio cosmetics	si	o	rio cometcs	io	s	r cosmetics	0.333
dreams	dream big make dua move mountains	dreams		big make dua move mountain	dream	s	big make dua move mountains	0.317
6x	vivo x6	6	x	vivo x	6	x	vivo x	0.182

References

[1] https://circleid.com/posts/towards-a-quantitative-approach-for-objectively-measuring-the-similarity-of-marks

[2] The Python-based fuzz.ratio metric (F_lev), and Jaro-Winkler similarity (sim_j)

[3] The fuzz.ratio metric applied to the Soundex (F_sou) and NYSIIS (F_NYSIIS) phonetic representations of the pair of marks

[4] The overall visual similarity (S_vis) can simply be quantified as S_vis = (F_lev + sim_j) / 2; the overall aural similarity (Saur) as Saur = (F_sou + F_NYSIIS) / 2; and the overall (total) similarity (S) as S = (S_vis + Saur) / 2 (or, equivalently, as the mean of the four individual components)

[5] Taken from the Darts-ip tool (https://app.darts-ip.com/darts-web/login.jsf)

[6] Noting that, in the calculation, any accented characters have been replaced with their non-accented equivalents

[7] Trademark watch service provider, pers. comm., 09-Aug-2024

[8] In these tables, all marks are shown in an encoded form, to obfuscate the names of the actual marks being watched, so as to maintain confidentiality. The encoding is caried out by replacing every instance of each letter with the same alternative letter. This thereby generates pseudo-random strings, but in which a visual assessment of similarity across the individual pairs can still be carried out.

[9] In creating a 'clean' dataset, the following modifications to the 'raw' data have been made:

In cases where multiple distinct marks have been cited in a case, just one has been considered. The selected mark is generally the 'simplest' of the set and/or the one (subjectively) most similar to the other disputed mark or, all other factors being equal, the first mark in the list
All marks have been converted to lower-case characters (i.e. case is disregarded)
All accented characters have been replaced by their non-accented equivalents
All ampersands (&) appearing as space-separated words have been replaced by the word 'and'

[10] https://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970?pgno=5

[11] https://xlinux.nist.gov/dads/HTML/ratcliffObershelp.html

[12] In addition to other models for quantifying the spelling-based (visual) similarity

[13] https://pypi.org/project/fuzzywuzzy/

[14] https://www.internationalphoneticassociation.org/content/ipa-chart

[15] https://en.wikipedia.org/wiki/International_Phonetic_Alphabet_chart; a pronunciation guide can be found at: https://www.vocabulary.com/resources/ipa-pronunciation/

[16] https://en.wikipedia.org/wiki/Stress_(linguistics)

[17] M. Bernard and H. Titeux (2021). 'Phonemizer: Text to Phones Transcription for Multiple Languages in Python', J. Open Source Software, 6(68), p.3958.

[18] https://pypi.org/project/phonemizer/

[19] https://github.com/bootphon/phonemizer

[20] https://bootphon.github.io/phonemizer/install.html

[21] https://github.com/espeak-ng/espeak-ng#espeak-ng-text-to-speech

This article was first published as a white paper on 9 October 2024 at:

https://circleid.com/pdf/similarity_measurement_of_marks_part_3.pdf