RegEx format help needed

I'm trying to create an action to clean up artist tags that have single letters.
Like "b j Thomas", which I'd like to convert to "B.J. Thomas".
I've tried to do this with a regex expression action:

regex: "\b(\w\.?)\s?\b"
replace with: "$upper($1). "

The idea was to look for single letters \w followed by an optional period (capture group 1). After that an optional white space is allowed. The whole thing is embedded in \b anchors.
This works for the example above, but unfortunately \b also triggers for single apostrophs ('), although it should only do so for \w character class.
So "Rock 'n Roll" is converted to "Rock 'N. Roll" and "Mother's Finest" -> "Mother'S. Finest"...
Any idea how to improve this?

I don't think that replacing "a single letter followed by a space" with "same letter in uppercase followed by a dot and a space" will work as expected.

Just think about ARTIST names like
Cardi B -- > without dot at all
B. B. Gabor --> with spaces after the dot
B.B. King --> without spaces after the dot
BB Bronx -- > without dot at all

You could work around the problem with letters after an apostrophe if you filter them out (manual with F3 filter in Mp3tag) or expand the regular expression. Maybe with some kind of negative lookbehind.

Finding a regular expression for all thinkable cases seems to be overly complicated (and quite dangerous), if possible at all.

You're right - you'll never catch every thinkable combination, but that was not my objective anyway.
Even if you refer to "official/reliable" sources like Discogs, MusicBrainz or AllMusic, you'll find spellings different from what the artist wants it to be:
"B. B. Gabor" is "B.B. Gabor" @ AllMusic, "BB Gabor" @ Discogs and "B. B. Gabor" @ MusicBrainz and Wikipedia.
Therefore I decided to consistently use "X.X. Name" (like B.B. King) if I find the track tagged like "X X Name" or "X. X. Name" and leave the rest as it is (for manual editing).
I've never used lookbehind/lookforward regex expressions, maybe it's about time to start...
Btw. the above regex has another flaw: It replaces "X." with "X.." :confused:

P.S. Ugly work-around-solution (until I did my homework about advanced regex):

a) replace ' (single apostrophe) with "000" (Rock 'n Roll -> Rock 000n Roll)
b) regex replace "\b(\w)\.?\s?\b" with $upper($1)
c) replace back "000" with '

If you really only want to search for x x and replace them with X.X. you could use:

$regexp(%ARTIST%,'([a-zA-Z])\s([a-zA-Z])\s','\U$1.$2. ')

This regex looks for characters from a-Z followed by space - and this two times.
Then it returns the same characters in UPPERCASE followed by a dot and a space after the second character.

As said, this only covers exactly the occurence of two characters followed by space.

image

1 Like

Thanks for this. I could use two of them, each for 1 or 2 single letters.
I'll let you know if I find a solution with a single regex eventually...

1 Like

You can easily test your use cases at online sites like

On a side note: How do you insert pics in your post?

Just copy & paste (or drag & drop) them.

Or use the forums "upload"-button:

image
And I was wrong when I thought that \b was triggered on the apostrophe. Of course it was triggered on the single "n" after the apostrophe! (like the "s" in the above example in the word "varmint's"

Back with some less ugly solution, although not the single liner I wanted to write, sort of.
This one does it in two steps, which could be combined into a nested $regexp(), but I refrained from doing so for better readability:

regex 1: "(?<![\S])([A-Za-z])\.*(?=[ ])"
replace: "$upper($1)."

which finds a single letter with no non-whitespace to the left , an optional "." followed by a space. That gives me e.g. "B. J. Thomas". I was not able to eliminate the space between B. and J. while leaving the space before "Thomas" alone. That is done in the next step:

regex 2: "(\w\.)\s+(?=\w\.)"
replace: "$1"

which finds a single letter follow by a period, optional white space and another single letter with period.

But anyway, like you already noticed, this works for a lot of cases but you'll never know if some line surfaces where it doesn't. And since my action group "Correct Artist Spellings" already contains a lot actions to add a missing "The" or remove a wrong "The", correct upper/lowercase, correct incorrect notations (I was surprised how may different ways there are to spell "Booker T. & The MG's") etc., it doesn't really hurt to add some more lines for artists with initials. And that comes with the bonus of taking various notations into account.

regex:   "\bB[ .]*J[ .]*[ ]+Thomas\b"
replace: "B.J. Thomas"

In the end it was fun and I learned a lot about advance regex, esp. "lookaround" and non-capturing groups... :wink:

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.