Using rexexp on multiple tag fields

I have searched high and low, but cannot find an answer to what I would imagine is a common problem.

I have several tags with multiple instances, Artist for example. I can have from one to 20 or more artists, each having its own tag (using \\ to separate them) This works well with my music players.

However, if I want to change something to do with each individual Artist, such as swapping the surname from first to last (or vice versa) in one instance, there seems to be no way to do this on multiple occurrences. In my case I want to enter the Artist name naturally, with surname last, and then copy that to ArtistSort with the surname first.

I can perform all sorts of operations on one instance successfully. I can even operate on the nth instance (using $meta(x,n)). But there is no count, or 'for' (or do until etc) loop as far as I know to operate on an unknown number of instances.

I can merge all instances into one string (and separate them again), but then the regexp doesn't work on each Artist name.

I understand, at least in principle, that regular expressions can include procedures, but I don't see how that fits in with the actions as defined in mp3tag. If there is a way of using them I'd appreciate an example, since I find regular expressions quite difficult to understand.

There must be a way, isn't there? Am I really the only one?

Maybe ... see there ...
Need help with regex for ARTISTSORT
Multiple value artist tag to multiple value artistsort
Automating Changes to Artist Values

DD.20140911.1909.CEST

This is very helpful. I have read reams of stuff on regular expressions, but frankly I don't really understand them. In the solutions above I don't understand how they work on multiple occurrences, but they do.

However, I have one more wrinkle, and I've failed to get it to work.

If the artistsort name (all uppercase) starts with any one of a number of strings (e.g. THE) I don't want the regexp to do anything on that artist value. I got this to work on a single value artist, but am stuck making it work on multiples, no doubt because I don't understand what I am doing well enough.

The regular expression

\A(?!(?:THE|EL|LA|LOS|LAS|LE|LES|QUATUOR|QUARTETTO|QUINTETTO|I)\s.+$)\A(.+)\s(\w+)$  
to $2 $1

works by failing on a string starting with THE or EL etc, followed by space and characters to the end of string, and hence leaving the string alone. Otherwise it starts at the beginning of the string again and reverses the last part of the name to be the first.

As it stands this can't work on multiple values because \A starts at the beginning of the string, and $ is the end of the string.

So to take one of the solutions to multiple artists above, eg:

([^;]+)(\s[^;]+)
to $2 $1

I try to include the exclusion strings I used before but get all sorts of strange answers, none of which are close to what I want.

Help!

I have now managed to solve almost all of my issues with this, with the help of http://regex101.com/, http://www.regular-expressions.info/ and DetlevD

Firstly, what I want to do is get ArtistSort values as as follows:

  1. Take a set of Artist names, stored as separate values, such as:

Trevor Pinnock\\The Allegri Quartet\\Steuart John Rudolph Bedford\\Jim\\Quintetto Italiano

  1. For those names which start with The, El, etc, or some others such as Quintetto.... do not change the order of the names

  2. For all the others, move the last name (surname) to be the first name

  3. Then remove any leading The, El, etc...

Leaving me with the Artist having its natural name, but sorted by surname except where in a band or group, and ignoring leading definite articles.

I have spent months, on and off, trying to get all this to work. I managed with single value Artists, but until recently failed with multi-valued ones. I now have a complete solution, with one minor niggle.

I have three actions, as follows:

A. Get ArtistSort to contain one string with / as separator, and invert the surname where necessary

Format value

Field:
ARTISTSORT

Format string:
$regexp($upper($meta_sep(artist,/)),'\b(THE|EL|LA|LOS|LAS|LE|LES|QUATUOR|QUARTETTO|QUINTETTO|I)\s([^/]+)|([^/]+)\s([^/]+)',$4 $3$1 $2)

B. Remove leading definite articles

Replace with regular expression

Field:
ARTISTSORT

Regular Expression:
\b(THE|EL|LA|LOS|LAS|LE|LES)\s([^/]+)

Replace matches with:
$2

C. Convert Artistsort into separate value tags

Split field by separator

Field:
ARTISTSORT

Separator:
/

This all works well, with one minor problem.

It would make life much easier if all non-ASCII characters were converted to ASCII in the sorted form. I know this is possible, character by character, but there are an awful lot when all European languages are included. If mp3tag had a function to do this (eg $to_ascii) it would make the actions an awful lot shorter and simpler. The current $ansi function doesn't do the same job.

I could try to explain here how part A works. To be honest I'm not sure why it works on the concatenated string of all Artist names, but the rest says something like:

$1 starts with a definite article
followed by a space
$2 is the rest of the name after such an article (and space)

OR (using |):

$3 is the first part of the name ('greedy')
$4 is the very last part of the name

$3 and $4 are empty if there is a definite article at the start, and $1 and $2 are empty if there is no definite article. Stringing the result together as $3 $4$1 $2 inverts $3 and $4 and leaves $1 and $2 alone.

You can try ... regular expression equivalent classes ... for example ...
$regexp('ÀÁÂÃÄÅĀĂĄ','[[=A=]]','A') ==> AAAAAAAAA
$regexp('a, á, à and â','[[=a=]]','a') ==> "a, a, a and a"
$regexp('e, é, è and ê','[[=e=]]','e') ==> "e, e, e and e"
$regexp('i, Í, ì and î','[[=i=]]','i') ==> "i, i, i and i"
$regexp('o, ó, ò and ô','[[=o=]]','o') ==> "o, o, o and o"
$regexp('u, ú, ù and û','[[=u=]]','u') ==> "u, u, u and u"

Equivalence classes
An expression of the form [[=col=]], matches any character or collating element whose primary sort key is the same as that for collating element col, as with collating elements the name col may be a symbolic name. A primary sort key is one that ignores case, accentation, or locale-specific tailorings; so for example [[=a=]] matches any of the characters: a, À, Á, Â, Ã, Ä, Å, A, à, á, â, ã, ä and å. Unfortunately implementation of this is reliant on the platform's collation and localisation support; this feature can not be relied upon to work portably across all platforms, or even all locales on one platform.
http://www.boost.org/doc/libs/1_44_0/libs/...erl_syntax.html

You can try ... Mp3tag action groups for "Unicode to ASCII" ...
/t/7279/1

DD.20140914.1030.CEST

Thanks very much.

The only hassle now is that I have 9 separate fields to process every time, which is rather a lot of actions, but so be it.

What I want to do is enter the basic data and then run one action to process the whole lot in a standard way. Is there a way of invoking one action group from another action group? I can't find one, but I may be blind.

If there was such a method I could create the 9 sets of action groups for the nine fields that need processing, and invoke each of them from my main action group. This technique could apply to other actions as well, when I may on occasion just want to run a lower level action without doing the whole lot.

You may use this simple code, if it meets your needs for sorting.

$regexp(%ARTIST%,'[[:unicode:]]',' ')

Not yet.

DD.20140914.1413.CEST

I'm going to have to guess what this does, since I can't find any reference to :unicode: in my regex references. Does it take any non-ascii character (ie not a-z, A-Z, 0-9 etc) and replace it with space, or is it more subtle than that? If that is what it does then isn't what I need, so I'll have to bite the bullet and generate all the conversion actions I need using an editor.

I really appreciate your prompt and detailed help, but how could I have found this out for myself? Is it defined somewhere?

I think, Mp3tag uses the Boost.Regex machine ... http://www.boost.org/
... and there is a character class supported ...
unicode ... Any extended character whose code point is above 255 in value.

DD.20140914.1618.CEST

Thanks very much indeed. I now have enough to achieve what I am trying to do :slight_smile: