Replace "only as whole word", not working if word is followed by a period

KPexEA · March 17, 2020, 5:01pm

Not sure if this is "as designed" or a bug.

I have a set of actions for fixing (changing all letters to uppercase) roman numerals in track names and they use the Replace action and select "only as whole word" option.

The seem to work correctly except in the case where the roman number is immediately followed with a period character, in that case the string is not replaced.

Example Actions: (all with 'only as whole word' ticked)
Replace "ii" -> "II"
Replace "iii" -> "III"
Replace "iv" -> "IV"

This works:
Foo Part Ii -> Foo Part II

This fails:
Foo Part Ii. -> Foo Part Ii.

ohrenkino · March 17, 2020, 6:00pm

I can confirm that the action "Replace" behaves in the way that you described.
Fascinatingly, the function $replace() works on a string like this:
$replace('A New Found Glory Ii A New Found Glory Ii.',Ii,II) - > A New Found Glory II A New Found Glory II.

Edit: I would like to add that any punctuation following the search term will hinder the function to replace it.
I tested it with !?;:,

Florian · March 21, 2020, 8:24pm

It's correct, the option "only as whole word" currently only uses the blank as word separator. This is by design, however I'm open for discussion.

Is it safe to assume that any punctuation can always be treated as word separator?

ohrenkino · March 21, 2020, 8:42pm

I think that there is the DIN 5008 that constitutes typographic rules where to put a space character and what to do with punctuation (Satzzeichen ";:,.!?) plus non-aphabetical characters (Schriftzeichen §$%&/=\#'+*-<>|^°(){}[])
So I would appreciate it if some more characters could be added to define words.

KPexEA · March 21, 2020, 10:07pm

As ohrenkino said, it would be great if in addition to spaces, it could also treat punctuation characters and non-alphabetical characters as word separators.

poster · March 22, 2020, 5:38am

Who knows? There are so many countries and so many alphabets.

I doubt that the DIN has any relevance in other countries than Germany. And there is not only the german and english language. And there are a lot of alphabets.

ohrenkino · March 22, 2020, 8:19am

Could you supply examples where the classical punctuation characters appear in the middle of a word?
Otherwise I would still assume that punctuation comes on top of the classical space character and is helpful in those languages that use punctuation.

poster · March 22, 2020, 9:31am

As I already wrote: Who knows?
I know nothing about foreign alphabets indian, arabic, Thai ...

As problematic examples there are numbers
10,000 Maniacs
and abbreviations
U.S.A.

ohrenkino · March 22, 2020, 10:00am

They more or less prove the point: Abbreviations should be capitalized and not become U.s.a as they do now and in repect to numbers: I don't know what a small or capitalized number would look like - they simply stay the same.
So the question remains: is there a language where a punctuation character appears in the middle of a word?

An alternative for the "Replace only as Word" function would be to add the option to add user-defined characters like in the case conversion action. This list should be remembered and become only active if the option "as word" has been selected.

Crissov · March 22, 2020, 3:12pm

Apostrophes may occur inside words in a number of languages, as can hyphens. In others, they may be separators instead. Considering gender-inclusive and other creative language practice (as is common in band names and song titles at least in some genres), exclamation marks, colons, middle dots, currency symbols, digits etc. can also be part of words. Furthermore, some of these characters are often replaced by similar ones, e.g. ', ʼ, ` and more for apostrophes.

In conclusion, provide a useful default, but make it customizable.

ohrenkino · March 22, 2020, 3:23pm

Apostrophes are no punctuation characters and neither are currency symbols, digits, accents.
In my understanding punctuation includes only the classical characters ,.;:!?
But making it customizable is probably the silver bullet.

Crissov · March 22, 2020, 5:59pm

As usual, it is not as simple as it may seem at first, especially if you start to consider non-European scripts. Even without looking at particular languages (which CLDR does), Unicode has dedicated categories for

vilsen · March 23, 2020, 4:58pm

This is how dots and colons could be problematic:

10 > ten => 10,000 Maniacs > ten,000 Maniacs

20 > twenty => 20.5 > twenty.5

usa > U.S.A. => www.usa.net > www.U.S.A..net

A > a => A. Anderson > a. Anderson

That would solve it. A field where you could enter word separators, next to the "only as whole word" box, would be great.

Florian · March 23, 2020, 6:07pm

Thanks all for your input on this. I've chosen a fairly pragmatic approach and extended the list of characters that mark word boundaries with Mp3tag v3.00e.

Currently, I prefer to have the few edge cases that have been pointed out here and handle them individually over a much more larger group of standard users that would have to deal with yet another configuration option (even if it would provide a sensible default).

Let's see. The report from the OP should be addressed by the new version.

Crissov · March 24, 2020, 7:41am

Like custom lists for certain tag fields, this would be fine as an advanced configuration option only available by manually editing a text file.

vilsen · March 24, 2020, 2:17pm

I'm sorry, but I think it's unfortunate not having user control over this. I have too many Replace actions in my often very large Action groups to feel safe about the change. IMHO the previous default should not be changed - so either don't change anything or make it optional.

For me it's better to be safe than sorry, so I'll have to stick with v3.00d.

FWIW, the OP could simply use multiple Replace actions to solve the problem.

system · April 23, 2020, 2:17pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.