Regular Expressions

Regular Expression Tutorial

Beginner or Professional!
Please take a few minutes and look this presentation (slideshow or PDF):

Andrei’s Regex Clinic

This is an outstanding work, which visualises the world of Regular Expressions in a wide manner.
The tutorial can help to open your mind.

http://zmievski.org/c/dl.php?file=talks/co...egex-clinic.pdf
http://www.slideshare.net/andreizm/andreis-regex-clinic
http://zmievski.org/2010/05/regex-clinic-on-slideshare

DD.20110110.1912.CET

Are all the things described there useable for mp3tag?

Probably not, one must abstract.

DD.20110110.2111.CET

To Juozas V:
Thank you very much for this. It does the best English title case conversion that I have seen here.

However, I have one minor quibble. Your word list includes some words and phrases that I see more often used in song titles as subordinate conjunctions than as prepositions. For example "after", "because", and "although". From my reading on capitalization in titles, subordinate conjunctions should always be capitalized. Since it's not practical to use script to detect how a word is used, I chose to remove those words from your Reg Ex on the basis that there should be fewer errors without them than with them. The words that I removed are:

After, As, Although, Because, Even If, Since, Till, Until, When, and While.

Here is my revised word list (resorted alphabetically):

A|About|Above|Across|Against|Along|Alongside|An|And|As|At|Before|Below|But|By|Du
ring|For|From|In|Into|Nor|Of|Off|On|Onto|Or|Out|Over|So|Than|The|Through|To|Unde
r
|Up|With|Within|Without

Note that I also added the coordinating conjunction "Nor" to your list.

Best regards,
Doug M. in NJ

Convert MP3tag's date to ISO date

Converts (example) 18.04.2011 to 2011-04-18.

format tag field
Field: date added
Format string: %_date%

replace with regular expression
Field: date added
RegExp: ^(\d+).(\d+).(\d+)$
Replace with: $3-$2-$1

I could not get, "DetlevD - Splitting an Upper Camel Case string" to work, although I like the idea!

These together:

RE:Adds a space between Capital and lowercase letter or digit behind it.
Field:_Tag
re:([^A-Z\W\_])([A-Z])(?=[^A-Z])
($1) ($2)

RE:Adds a space between Digit and lower case letter behind it
Field:_Tag
re:([^\W\d\_])(\d)
($1) ($2)

Will...

Example.
From:
"ThisIsThe2ndSongFromD.D.'sFirstAlbum30YearsAgo."
To:
"This Is The 2nd Song From D.D.'s First Album 30 Years Ago."

This is the two above combined:

RE:Adds a space between, Capital and lowercase letter or digit behind it, Digit and lower case letter behind it.
Field:_Tag
re:([^A-Z\W\_])([A-Z])(?=[^A-Z])|([^\W\d\_])(\d)
$1$3 $2$4

Does the same except in the case of...

CapitalWord9CapitalWord2ndSong30Years

Example.
First pass will: Capital Word9 Capital Word 2nd Song 30 Years
Second pass will: Capital Word 9 Capital Word 2nd Song 30 Years
Third pass will: Reveal A Latent O.C. Disorder

Because the single digit '9' in the example can only be captured once per pass per replacement.

I would use the first set for completeness and through the "Action Groups".
I would use the Second for brevity and through "Actions (Quick)"

To remove any website in the filename

example: artist - title[www.whateverwebsite.com].mp3

           artist - title.mp3

regular expression: [.{3}.\w*..{3}]

OK, i know im being dumb, just this reg ex stuff is over my head..u might just as well speak japanese to me.
All the examples i find talk of removing ## - [track title] to [track title]
i just want ## [track title] to [track title]...no dash.
appreciate a 'simple' answer for simpleton.

I've added an example without dash to that post.

Thank u so much...

Convert from RFC822/1123 Date string to ISO-8601 Date string.

Examples:
RFC822/RFC 1123 ==> ISO-8601
1 Feb 2009 ==> 2009-02-01
30 Sep 2010 ==> 2010-09-30

$replace($regexp($right('0'%YEAR%,11), '(\d{1,2}) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (\d{2,4})','$3-$2-$1'), 'Jan','01','Feb','02','Mar','03','Apr','04','May','05','Jun','06', 'Jul','07','Aug','08','Sep','09','Oct','10','Nov','11','Dec','12')

See also:
http://www.w3.org/Protocols/rfc822/
http://www.freesoft.org/CIE/RFC/1123/99.htm

DD.20110801.0623.CEST

Example:
1 Feb 2009 HH:MM ==> 2009-02-01 HH:MM

$replace($regexp($right('0'$cutRight(%YEAR%,6),11), '(\d{1,2}) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (\d{2,4})','$3-$2-$1'), 'Jan','01','Feb','02','Mar','03','Apr','04','May','05','Jun','06', 'Jul','07','Aug','08','Sep','09','Oct','10','Nov','11','Dec','12')$right(%YEAR%,6)

DD.20140310.2236.CET

1 Like

How to copy a list of artists and their roles ...
... from tag-field COMMENT
... to tag-field INVOLVEDPEOPLE

Example 1
From: COMMENT
Person1:Role1
Person2:Role2
Person3:Role3

To: INVOLVEDPEOPLE
Role1:Person1;Role2:Person2;Role3:Person3;

Action: Format value
Field: INVOLVEDPEOPLE
Formatstring:

$regexp(%COMMENT%$char(13),'(.+?):(.+?)[\r\n]+','$2:$1;')

Example 2
From: COMMENT
Person1:Role1
Person2: Role2a,Role2b
Person3 : Role3a, Role3b
Person4: Role4a & Role4b, Role4c

To: INVOLVEDPEOPLE
Role1:Person1;Role2a,Role2b:Person2;Role3a,Role3b:Person3;Role4a & Role4b,Role4c:Person4;

Action: Format value
Field: INVOLVEDPEOPLE
Formatstring:

$regexp($regexp(%COMMENT%$char(13),'(.+?)\s*:\s*(.+?)[\r\n]+','$2:$1;'),'\s*,\s*',',')

DD.20110824.1757.CEST

What is $char(13) ? And why is it needed here?

$char(13) is the "CarriageReturn" control character.
It is appended here on the fly to the COMMENT string as a helper, just to make sure, that there is at least one "CarriageReturn" character at the end of the COMMENT string, in order to let the RegExp work correctly, even for the case, when the original COMMENT string has no trailing CarriageReturn/LineFeed sequence.

DD.20110824.1750.CEST

If you want to add TEXT to the existing tag:

Regular expression: (.*)
Replace with: TEXT \1 \0

I found that if you don't put in the \0 at the end it will repeat TEXT twice. For example:

Title: Texty text
Becomes: TEXT Texty textTEXT

I'm using v2.48. This might of been fixed in the later versions, but it's the version I've been using and some things I refuse to update to a newer version since I've never had any other issues with it except for this one bit of annoyance.

What seems to be happening is the "end of line" is being matched a second time for some reason.

Using 2.48, I duplicated your bug but also tweaked the RegEx to work properly.

Regular expression: ^(.*)$
Replace with: TEXT $1

That explicitly tells the engine not to include the newline in the match.

It's not a bug.
It happens because global matching is activated by default.
That means the engine tries to match the regex pattern as many times as possible.

With (.) the whole text is matched at first. Then the engine stands at the end of the text. And it tries to match the pattern again.
And it succeeds because .
also matches "nothing".

But I'd prefer a simple "Format value" action is for this task anyway.

Can you help me to convert 1 Feb 2009 HH:MM to 2009-02-01 HH:MM

Read there ...
Regular Expressions

DD.20140310.2238.CET

A very nice regex. I change it a little bit, with this one:

Regex:

((\s*0*)*(\d+)(\s*0*)*)*(.*(\s*0*)*\d+(\s*0*)*)*

Replace with:

$3

because in this extreme examples:

01/100
0 01/100 0
0 01/ 100
0 01 / 0 0100
0 0 00001 / 0 0 0 100 0 0 0
0 0 20 0 / 0 200 0
0 0 2 0 0 / 0 200 0
/ 0 200 0
/ 0 0 20 0 0

didn't work for me.

1 Like