Reg ex problem, please help.


#1

Hi! I have a rather large album with an odd filename format and blank artist and title fields. I need to use reg ex with the filename to somehow fill in the missing fields.

example: 103 Jefferson Airplane embryonic_journey

A pretty strange format, wouldn't you agree?

So I want to end up with:
%artist% = Jefferson Airplane
%title% = embryonic_journey
%track% = optional

All titles in the filename are lower case and have an underscore between words. All artist names are capitalised with no underscores. The main problem I can see is that the number of words in the artist name varies - some have 1 and others have 3 or 4.

Thanks, your help is appreciated.


#2

But the title is alway one word thanks to the underscores.

Step 1:
Action: Guess Values
Sourceformat: $regexp(%_filename%,(\d*) (.) (.),$1xxxxx$2xxxxx$3)
Formatstring: %track%xxxxx%artist%xxxxx%title%

Step 2:
Action: Case Conversion
Field: TITLE
Case Conversion: Mixed Case
Words begin after any of: _

Step 3:
Action: Replace
Field: TITLE
Original: _
Replace with: " " (without quotation marks, just one space)


#3

That works perfectly, thank you very much! Your regex Jedi skills are truly impressive. Personally, I'm finding the whole thing a bit difficult to get into. Do you have any tips on what books/tutorials I should read to learn the basics?


#4

Actually it stands all here:
http://help.mp3tag.de/options_format.html#regexp
It's explained very shortly, but search and ask the forum and you will find plenty of examples and some explanations.

I didn't learn it form anywhere else, just Mp3Tag and here in the forum.
Use the Tag-Filename Converter :mt_ttf: as a quick preview for regular expressions and experiment a bit with it. I always do use it, even if I want to use the regexp function in another action as above, Tag-Filename Converter gives a good preview to see if the regexp string does what i want.


#5

Using that link as a reference, I can identify all the individual commands within "$regexp(%_filename%,(\d*) (.) (.),$1xxxxx$2xxxxx$3)", but I just can't make sense of the syntax. The use of "," also confuses me, and isn't explained in that reference table. Do you think you could break it down for me so I can see how you did it? Thanks for your patience, I'm a total n00b I know. :smiley:


#6

help for the script functions such as $regexp:
http://help.mp3tag.de/main_scripting.html

from that link:
$regexp(x,expr,repl):
replaces the pattern specified by the regular expression expr in the string x by repl. The fourth optional parameter enables ignore case (1) or disables the ignore case setting (0). Please note that you have to escape comma and other special characters in expr.

So basically with the function $regexp(x,expr,repl) you can use the action Relace With Regular Expression within other actions such as Guess Values in our case. The "," in the function seperates what you write into the different fields of the action window.
Function:
$regexp(x,expr,repl)
Action:
Field = x
Regular Expression = expr
Replace Matches With = repl

you also need the knowledge from here to about the Guess Values action:
http://help.mp3tag.de/options_format.html#guess

$regexp(%_filename%,(\d*) (.) (.),$1xxxxx$2xxxxx$3)
So, x is the string which provides the source of the operation. In out case it is the filename. But it could also be a cobination of different palceholders/fields in other cases.
%_filename% = 03 Jefferson Airplane embryonic_journey

$regexp(%_filename%,(\d*) (.) (.),$1xxxxx$2xxxxx$3)
The expression which should be replaced is given here. In our case, it is the whole filename, seperated in different groups.
You can also make expressions which only match parts of the source, in this cases the rest of the source keeps untouched (see here for a example: Need help with RegExp ).
\d* matches a number which is repeated any number of times
.* matches any character repeated any number of times
the spaces are essential, they are the only fix characters here which indicate the seperation of the different matches above.
the parentheses make no difference in the match of the expression. they are only a reference for $1, $2 and $3 later.
So Mp3Tag checks for a sequence of characters in the source which matches \d* .* .*

\d* .* .* with the expample 103 Jefferson Airplane embryonic_journey explained step by step: \d = 1 is one digit * = 03 are all digit which come direct after that digit or nothing if there is no more digit. " " = " " is one space which comes direct after that digit(s) . = J is the character after that space * = efferson Airplane are all characters after that character. If there whould come nothing after it would be everything to the end (spaces are also characters). But in our case " .*" comes after it. So it is the longest possible sequence of characters which has still a space after it, i.e. everything before the last space in the source (%_filename%). If i had written \d* .*? .* instead, it .*? would be the shortest possible sequence of characters which has a space after it, i.e. only the first word " " = " " is, as sayed, the last space in the source. . = e is the character after that space. * = mbryonic_journey are all characters after that character, this time without limit, i.e. everything to the end of the filename.

$regexp(%_filename%,(\d*) (.) (.),$1xxxxx$2xxxxx$3)
This is the replacement for the expression. $1, $2 and $3 is what is found in the first, second and thrid parenthesis of the match:
$1 = 103
$2 = Jefferson Airplane
$3 = embryonic_journey
xxxxx is a free choosen seqence of characters which has the purpose of a marker between the desired field values $1, $2 and $3. As the space, which was the marker before, also is part of the artist name, I replaced it which with something which is surely not part of the artist or title name. If the would be a tilte which is for example "fxxxxxck" it would not work.

So
$regexp(%_filename%,(\d*) (.) (.),$1xxxxx$2xxxxx$3)
is without palceholders
$regexp(103 Jefferson Airplane embryonic_journey,(103) (Jefferson Airplane) (embryonic_journey),103xxxxxJefferson Airplanexxxxxembryonic_journey)

103xxxxxJefferson Airplanexxxxxembryonic_journey is the output of our $regexp funtion

And this serves as the Sourceformat for out Guess Values action.
The Guess value Formatstring now makes use of the xxxxx as spliter between the desired field names:
Sourceformat written as function: $regexp(%_filename%,(\d*) (.) (.),$1xxxxx$2xxxxx$3)
Sourceformat as output of the function: 103xxxxxJefferson Airplanexxxxxembryonic_journey
Formatstring : %track%xxxxx%artist%xxxxx%title%
Output of the Formatstring = Output of the Guess Value Action:

TRACK = 103 ARTIST = Jefferson Airplane TITLE = embryonic_journey

#7

Pone, wanna say WAW! What A Work!

Here is a proposal that goes a simple step by step way.

Begin Action Group Test 2011#20110202.yog-sothoth

Action #1
Actiontype 5: Format value
Field: TRACK
Formatstring: %_FILENAME%

Action #2
Actiontype 5: Format value
Field: ARTIST
Formatstring: %_FILENAME%

Action #3
Actiontype 5: Format value
Field: TITLE
Formatstring: %_FILENAME%

Action #4
Actiontype 4: Replace with regular expression
Field: TRACK
Regular expression: (\d*)÷(.)÷(.)
Replace matches with: $1

[_] Case sensitive comparison

Action #5
Actiontype 4: Replace with regular expression
Field: ARTIST
Regular expression: (\d*)÷(.)÷(.)
Replace matches with: $2

[_] Case sensitive comparison

Action #6
Actiontype 4: Replace with regular expression
Field: TITLE
Regular expression: (\d*)÷(.)÷(.)
Replace matches with: $3

[_] Case sensitive comparison

Note: Replace each special ÷ character with one space character.
End Action Group Test 2011#20110202.yog-sothoth (6 Actions)

From:
_FILENAME=103 Jefferson Airplane embryonic_journey
To:
TRACK=103
ARTIST=Jefferson Airplane
TITLE=embryonic_journey

DD.20110202.1550.CET


#8

So geht es auch:

Begin Action Group Test 2011#20110202.yog-sothoth.2

Action #1
Actiontype 5: Format value
Field: TRACK
Formatstring: $num(%_FILENAME%,1)

Action #2
Actiontype 5: Format value
Field: ARTIST
Formatstring: %_FILENAME%

Action #3
Actiontype 5: Format value
Field: ARTIST
Formatstring: $cutLeft(%ARTIST%,$strchr(%ARTIST%,'÷'))

Action #4
Actiontype 5: Format value
Field: ARTIST
Formatstring: $cutRight(%ARTIST%,$strrchr(%ARTIST%,'÷'))

Action #5
Actiontype 5: Format value
Field: TITLE
Formatstring: $cutLeft(%_FILENAME%,$strrchr(%_FILENAME%,'÷'))

Note: Replace each special ÷ character with one space character.
End Action Group Test 2011#20110202.yog-sothoth.2 (5 Actions)

From:
_FILENAME=103 Jefferson Airplane embryonic_journey
To:
TRACK=103
ARTIST=Jefferson Airplane
TITLE=embryonic_journey

DD.20110202.1652.CET


#9

That's awesome, pone. Very well articulated. I think I'm finally beginning to grasp it now. I tip my hat to you, good sir!

Thanks to you as well, DetlevD. Your method seems a bit laborious, but I can see what you're trying to do.

Thanks guys, you put a lot of work into this community and I'm very grateful for it. Cheers!