Regular Expressions


#21

I dislike it when a leading zero appears in the track number of the track tag (or when the total track count is added after a '/' char).

[Aside: It seems like a bad idea to pollute a nice simple tag like the track with extra information like this. My intuition is that the information in each tag should be kept as simple and pure as possible. In databases there is a similar concept: you usually want to achieve what is called a normalized design. But if someone has a good reason for doing otherwise, please educate me!

Note that I only have a problem with leading zeroes in the track TAGS. In contrast, when I use tracks in filenames, I DO like a leading zero if there are > 9 tracks on the CD because that causes the filenames to be lexicographically ordered, which is critical for proper sorting by your file system, as well as it visually looks better when filenames are displayed in a list.]

Earlier in this thread, phoenixdarkdirk pointed out how to remove any '/' char and following digits.

Here is how to remove any leading zeroes (as well as trimmable whitespace):

Regex:

\s*0(\d+)\s*

Replace with:
$1


#22

Actually, here is a single regex that does everything that I want:

Regex:

\s*0?(\d+)(\s*/\s*\d+)?\s*

Replace with:

$1

[/quote]

This will remove any trimable whitespace around the number (which will be the track tag in my case), remove any leading zero from the number, and remove any suffix after the number that starts with a '/' char (e.g. the total track count).


#23

I've just spent a while figuring this out myself, and came on here to post it only to find someone else already has :slight_smile:
Mine is quite similar
Filed: ARTIST
RegExp: ^(.+),\s+(\w*)
Replace With: $2 $1

Seems to work the same


Surname and Christian Names wrong way around
#24

Thank you so much for this! You saved me a LOT of time! :smiley:


#25

I've split some off-topic posts to Problem with Actions and Regular exp​ressions.


#26

This is not a regular expression for mp3tag, but a link to a page which contains lots of information about creating and using regular expressions.

In my oppinion everybody who uses regexp should have read it :wink:
www.regular-expressions.info

kind regards,
Matthias


#27
Switches first and last names

Thank you!
I developed it a little.

Variant A

It will not change those artists, where the conversion already done (artist has comma).
Example: Brel, Jacques William remains Brel, Jacques William

Action
: regular expression

Field: ARTIST Regular expression: ^([^,]+)\s([^,]+)$ Replace with: $2, $1
Variant B

It handles multipled for WMP artists (in format "artist1/artist2/artist3"), possible nicknames at the end in square brackets and trims unnecessary white spaces.
Example:

Elvis Presley /Brel, Jacques/ Gabriele Susanne Kerner [Nena ] will Presley, Elvis/Brel, Jacques/Kerner, Gabriele Susanne [Nena] Action group 1. action: regular expression (trim white spaces)
Field: ARTIST Regular expression: ^\s+|\s+$ Replace with: (nothing)
2. action: regular expression (multiple white spaces replace to 1)
<blockquote><i>Field:</i> ARTIST

Regular expression: \s{2,}
Replace with: " " (whitout quotation marks)

3. action: replace (delete white space after opening bracket)
  <blockquote><i>Field:</i> ARTIST

Replace: "[ " (whitout quotation marks)
Replace with: [

4. action: replace (delete white space before closing bracket)
  <blockquote><i>Field:</i> ARTIST

Replace: " ]" (whitout quotation marks)
Replace with: ]

5. action: split fields by separator (for the next actions)
<blockquote><i>Field:</i> ARTIST

Separator: /

6. action: regular expression (switches names without nickname)
  <blockquote><i>Field:</i> ARTIST

Regular expression: ^([^,[]+)\s([^,[]+)$
Replace with: $2, $1

7. action: regular expression (switches names with nickname)
  <blockquote><i>Field:</i> ARTIST

Regular expression: ^([^,[]+)\s([^,[]+)\s([.+])$
Replace with: $2, $1 $3

8. action: merge duplicate fields (...back)
<blockquote><i>Field:</i> ARTIST

Separator: /


Automating Changes to Artist Values
Words Count
#28

These are my Reg-Expressions there may be duplicates to what have previously been posted, or there are small changes to fix the many errors that the posted Reg-Expressions create. I have done a lot of testing to make sure these do what they say they do and nothing else. I hope this also allows me a place to create a back up.
Thank-you! Updated: (June/10/2011) Mp3tag 2.49

Action Group:Aa

RE: Remove track number from title, ex."1 - Title" < "Title".
Field:TITLE
re:^\s*\d+\s*-\s*
Nothing:

RE: Remove "The" from artist, ex."The Artist" < "Artist".
Field:ARTIST
re:^The\s+
Nothing:

Case conversion:
Field:_TAG
Case conversion:Mixed Case
/[({"-_

RE:Capitalize Roman Numerals up to 399. (trimmed down from 3999
RE:to reduce false positives) - <a href="http://bit.ly/lZdZsj" target="_blank">http://bit.ly/lZdZsj</a>
Field:_Tag
re:(?<!')\b(?=[CLXVI])((C{0,3})?((X[LC])|(L?X{0,3})|L)?((I[VX])|(V?(I{0,3}))|V)?)\b
$upper($0)

RE: Capitalize Zero stop Acronyms and Initialisms. - <a href="http://bit.ly/jADbI6" target="_blank">http://bit.ly/jADbI6</a>
Field:_Tag
re:(?:frowning:?<=[^\w\']|\_)|(?<=^))(ac|ad|afi|aol|asap|atm|bbc|bc|bce|blt|btw|cc|cia|crc|cst|csv|dc|dfa|dj|d
mv|doa|dst|eod|ep|est|et|faq|fbi|fm|gi|glc|gmo|imo|imho|iq|ira|jc|irs|krs|la|lp|
m
c|mst|mtd|nasa|oj|pc|pi|pj|pm|ps|qed|rv|sos|ssr|usa|ussr|tba|tbd|teotwawki|tlc|t
v
|ufo)(?=[^\w\']|\_|$)(\.*)
$upper($1)

RE: Capitalize the Letter Infront of a period. A.B.C.
Field:_Tag
re:(?<=\.)([^\W\d\_])
$upper($1)

RE: Capitalize the Letter Infront of a Space and Apostrophe.
Field:_Tag
re:(?<=\s')([^\W\d\_])
$upper($1)

RE: Lower Case Prepositions, Articles and, Coordinating Conjunctions.
Field:_Tag
re:(?<=\w\s)(a|as|at|an|about|above|across|after|against|along|alongside|although|amo
ng|and|around|as|at|because|before|behind|below|beneath|beside|between|beyond|bu
t
|by|de|despite|down|during|even|except|excepting|for|from|if|in|inside|into|like
|
near|next|nor|of|off|on|onto|or|out|outside|over|past|regarding|round|since|so|t
h
an|the|through|throughout|till|to|toward|under|underneath|unlike|until|up|upon|v
o
n|when|while|with|within|without|yet)(?=\s\w)
$lower($0)

RE: Lower Case Abbreviations, Add Stop.
Field:_Tag
re:(?<=[^\w\']|\_)(alt|ave|capt|cent|corp|div|ed|eg|etc|fag|feat|gen|hr|ie|inc|inst|lb|ltd|
min|mt|op|pl|pop|pseud|pt|pub|rev|sec|ser|sgt|st|univ|vs|vol)(?=[^\w\']|\_)(\.*)
$lower($1).

RE: Add Space Before, & ( { [ + =
Field:_Tag
re:([^\W\_])([&\(\{\[\+\=])
$1 $2

RE:Add Space After, & ) } ] ; : , ! + =
Field:_Tag
re:([&\)\}\]\;\:\,\!\+\=])([^\W\_])
$1 $2

RE: Add Space After, .  <a href="http://bit.ly/jADbI6" target="_blank">http://bit.ly/jADbI6</a>
Field:_Tag
re:(?<!^)(?<!\d|\s|\.)(\.)([^\W\d\_])(?!\.|\s|$)
$1 $upper($2)

RE: Add Space After, " "
Field:_Tag
re:(".*?")([^\W\_])(?!$)
$1 $2

RE: Add Space Before, " "
Field:_Tag
re:([^\W\_])(".*?")
$1 $2

RE: Remove Spaces After, ( [ { Before, ] } ) ? : ; , ! .
Field:_Tag
re:([\(\[\{])\s+|\s+([\]\}\)\?\:\;\,\!\.])
$1$2

RE: Remove Spaces Before / After.
Field:_Tag
re:\s+(\/)\s+
$1

RE: Remove Spaces inside, " "
Field:_Tag
re:"\s*(.*?)\s*"
"$1"

RE: Remove Spaces Before and After String.
Field:_Tag
re:^\s+|\s+$
Nothing:

RE: Remove all Double+ Spacing.
Field:_Tag
re:\s{2,}
One Space:

RE: Add Apostrophe to Are Contractions.
Field:_Tag
re:\b(how|they|what|when|where|why|you)re(?=[^\w\']|\_|\$)
$1're

RE: Add Apostrophe to Had/Would Contractions.
Field:_Tag
re:\b(he|how|i|it|she|they|we|what|where|who|why|you)d(?=[^\w\']|\_|\$)
$1'd

RE: Add Apostrophe to Have Contractions.
Field:_Tag
re:\b(could|how|i|might|must|should|we|what|when|where|would|you)ve(?=[^\w\']|\_|\$)
$1've

RE: Add Apostrophe to Is Contractions.
Field:_Tag
re:\b(he|here|how|it|let|she|that|there|two|what|when|where|who|why)s(?=[^\w\']|\_|\$)
$1's

RE: Add Apostrophe to Not Contractions.
Field:_Tag
re:\b(ain|aren|can|couldn|didn|doesn|don|hadn|hasn|haven|isn|mightn|mustn|shoul
dn|wasn|weren|won|wouldn)t(?=[^\w\']|\_|\$)
$1't

RE: Add Apostrophe to Will Contractions.
Field:_Tag
re:\b(how|i|it|she|that|there|they|what|when|where|who|why|you)ll(?=[^\w\']|\_|\$)
$1'll

RE: Add Apostrophe to Am Contractions.
Field:_Tag
re:\b(i)m(?=[^\w\']|\_|\$)
$1'm

RE: Add Apostrophe to Do Ya Contraction.
Field:_Tag
re:\b(D)ya(?=[^\w\']|\_|\$)
$1'ya

RE: Add Apostrophe to Do You Contraction.
Field:_Tag
re:\b(D)you(?=[^\w\']|\_|\$)
$1'you

RE: CamelCase Mc Words.
Field:_Tag
re:\bMc(?=.)
Mc$upper($1)

RE:CamelCase O' Words.
Field:_Tag
re:\bO'([^\W\d\_e])
O'$upper($1)

mp3tagRegExpressions.txt (6.23 KB)


Case conversion...
Roman Numerals
Aktionen optimieren
O'dxxx to O'Dxxx
#29

My title and/or filename looks like:

Gospel&spiritual 1 - Track 03.[128kb 44khz 2'34]

I want to delete the suffix: '.[128kb 44khz 2'34]'.

I do it the way:

Field: TITLE
RegExp: (.).[[0-9]+kb\s[0-9]+khz.]
Replace with: $1

Just change that to the field _FILENAME if you want to do it for the filename.

If you want to take a more easy solution (for example delete everything after the first dot, ...) see the next posts from DetlevD or RevRagnarok.

Update: Added a '' befor the dot, to make sure, only a real dot before the '[' will be replaced, thx to RevRagnarok for the hint.


#30

This is a good example for the situation, where someone does not really need to learn the regular expression language, but instead use a simple action.

Actiontype 7: Import tag fields (guess values)
Source format: %TITLE%
Guessing pattern: %TITLE%.%DUMMY%
From:
Gospel&spiritual 1 - Track 03.[128kb 44khz 2'34]
To:
Gospel&spiritual 1 - Track 03

DD.20100816.1822.CEST


How To Remove Parentheses and its contents?
#31

FYI, that is slightly incorrect. It will actually remove 1 character from before the '[' which in your example is '.' - you need to escape the '.' to ensure it is a '.':
Field: TITLE
RegExp: (.)</b>.[[0-9]+kb\s[0-9]+khz.]
Replace with: $1

Another simpler option would be to just grab everything before the first dot:
Field: TITLE
RegExp: ([^.]*)
Replace with: $1

Or if you want everything up to the first '[' but no period, why bother matching what is in the brackets unless you planned on parsing it for something else?

Field: TITLE
RegExp: (.*).[
Replace with: $1

I am at work, so this is all untested RE code.


#32

Roman numerals in uppercase

This "Regular Expression" changes Roman numerals in uppercase.
Valid range of numbers: "I" to "MMMCMXCIX" (decimal: 1-3999).

Dieser "RegulĂ€re Ausdruck" Ă€ndert römische Ziffern in Großbuchstaben.
GĂŒltiger Zahlenbereich: "I" bis "MMMCMXCIX" (dezimal: 1-3999).

Example
From:
"ab i ab ii ab iii iv v vv vi vii viii ix abc x mcmliv ll cmm mmix-ix-xi"
To:
"ab I ab II ab III IV V vv VI VII VIII IX abc X MCMLIV ll cmm MMIX-IX-XI"

$regexp(%TITLE%,'\b(?i:(?=[MDCLXVI])((M{0,3})((C[DM])|(D?C{0,3}))?((X[LC])|(L?X{0,3})|L)?((I[VX])|(V?(I{0,3}))|V)?))\b','\U$0')

Alternative:
(using the 'ignore case' parameter of the Mp3tag $regexp function instead of regex modifier)

$regexp(%TITLE%,'\b(?=[MDCLXVI])((M{0,3})((C[DM])|(D?C{0,3}))?((X[LC])|(L?X{0,3})|L)?((I[VX])|(V?(I{0,3}))|V)?)\b','\U$0',1)

Attached is a Mp3tag mte export script, which visualizes the results of three attempts using regular epressions, which are able to upcase Roman Numerals in different quality.
Test.RomanNum.Upcase.zip (1.95 KB)

DD.20100831.1133.CEST
Edit. Spelling error in RegEx corrected and zip file attached.
DD.20110320.1518.CET

Test.RomanNum.Upcase.zip (1.95 KB)


Please help with another special regular expression
Help on some quick actions
Roman Numerals
#33

Splitting an "Upper Camel Case" string

The following "Regular Expression" splits an "Upper Camel Case" string into components by inserting a space character before any Word which starts with a capital letter or digit.

Der folgende "RegulĂ€re Ausdruck" teilt eine Zeichenkette mit Binnenmajuskeln in Komponenten auf durch EinfĂŒgen eines Leerzeichens vor jedem Wort, das mit einem Großbuchstaben oder einer Ziffer beginnt.

Example
From:
"ThisIsThe2ndSongFromD.D.'sFirstAlbum30YearsAgo."
To:
"This Is The 2nd Song From D.D.'s First Album 30 Years Ago."

$regexp(%_FILENAME%,'(?<!^)(\u\l|(?<=\l)[\u\d])',' $1')

DD.20100917.1902.CEST


Edit.DD.20110816.1848.CEST



Finding titles without spaces between words
How Can I Split a String by Capital/Upper Case Letters?
Add spaces after CAPS
Aktion Trennung zwischen kleinem und großem Buchstaben
FĂŒhrende Leerzeichen löschen
#34

This didn't work for me. I tried to make my own and came up with this simple RegEx to get the cases right in my tags:

REGULAR EXPRESSION:
\b(?<!')(\w)
REPLACE WITH:
\u\1

This works for me. You can apply this to the _ALL field but be aware that this capitalizes your extensions too. If this looks somehow annoying to you (as it does to me), simply apply this to the _FILENAME field afterwards:

REGULAR EXPRESSION:
.([^.]+)$
REPLACE WITH:
.\L\1\E

These are both pretty simple but I spent quite a time figuring out how to make these apply for Unicode. I finally found out that Mp3tag has Unicode functionality standardly implemented into its RegEx engine. Heh. :smiley:

Edit: Here's some improvement. The last one only replaced the first letter of the word and put it into upper case disregarding the rest of the word. This one also puts those that follow the first letter into lower case ("DAItro" becomes "Daitro"):

REGULAR EXPRESSION:
\b(?<')([a-zA-Z])([^']*?)\b
REPLACE WITH:
\u\1\L\2\E

This next one is basically the same but takes note of the French article L' and puts the letter following the article into upper case. So, for example, "L'eau" becomes "L'Eau".

REGULAR EXPRESSION:
\b(?<!(?<!\s[Ll])')([a-zA-Z])([^']*?)\b
REPLACE WITH:
\u\1\L\2\E


#35

Script to fix capitalization according to English rules only in tags. This is my n-th attempt to do this, but I think it works quite nice... Of course some human work is needed because it doesn't contain corpus to check it according to its POS.

[#0]
T=1
F=_TAG
1=1
2=

[#1]
T=4
F=_TAG
1=\\\\b(A|An|The|And|But|Or|So|After|Before|Out|When|While|Since|Until|Although|Even If|Because|About|Above|Across|Against|Along|Alongside|As|At|Below|By|During|For|
From|In|Into|Of|Off|On|Onto|Over|Than|Through|Till|To|Under|Up|With|Within|Witho
u
t)\\\\b
2=$lower($1)
3=0

[#2]
T=4
F=_TAG
1=^\\\\s*(\\\\w+)
2=$caps($1)
3=0

[#3]
T=4
F=_TAG
1=(\\\\w+)\\\\s*$
2=$caps($1)
3=0

Or directly in mp3tag:
Case conversion
Field _TAG
Case conversion Mixed Case
Words begin...

Replace with regular expression
Field _TAG
Regular expression \b(A|An|The|And|But|Or|So|After|Before|Out|When|While|Since|Until|Although|E
ven If|Because|About|Above|Across|Against|Along|Alongside|As|At|Below|By|During|For|
From|In|Into|Of|Off|On|Onto|Over|Than|Through|Till|To|Under|Up|With|Within|Witho
u
t)\b
Replace matches with $lower($1)

Replace with regular expression
Field _TAG
Regular expression ^\s*(\w+)
Replace matches with $caps($1)

Replace with regular expression
Field _TAG
Regular expression (\w+)\s*$
Replace matches with $caps($1)


#36

Regular Expression Tutorial

Beginner or Professional!
Please take a few minutes and look this presentation (slideshow or PDF):

Andrei’s Regex Clinic

This is an outstanding work, which visualises the world of Regular Expressions in a wide manner.
The tutorial can help to open your mind.

http://zmievski.org/c/dl.php?file=talks/co...egex-clinic.pdf
http://www.slideshare.net/andreizm/andreis-regex-clinic
http://zmievski.org/2010/05/regex-clinic-on-slideshare

DD.20110110.1912.CET


erster Artist vor Ft. ft feat. usw. soll in Album-Artist rein
Two Regex Questions
#37

Are all the things described there useable for mp3tag?


#38

Probably not, one must abstract.

DD.20110110.2111.CET


#39

To Juozas V:
Thank you very much for this. It does the best English title case conversion that I have seen here.

However, I have one minor quibble. Your word list includes some words and phrases that I see more often used in song titles as subordinate conjunctions than as prepositions. For example "after", "because", and "although". From my reading on capitalization in titles, subordinate conjunctions should always be capitalized. Since it's not practical to use script to detect how a word is used, I chose to remove those words from your Reg Ex on the basis that there should be fewer errors without them than with them. The words that I removed are:

After, As, Although, Because, Even If, Since, Till, Until, When, and While.

Here is my revised word list (resorted alphabetically):

A|About|Above|Across|Against|Along|Alongside|An|And|As|At|Before|Below|But|By|Du
ring|For|From|In|Into|Nor|Of|Off|On|Onto|Or|Out|Over|So|Than|The|Through|To|Unde
r
|Up|With|Within|Without

Note that I also added the coordinating conjunction "Nor" to your list.

Best regards,
Doug M. in NJ


#40
Convert MP3tag's date to ISO date

Converts (example) 18.04.2011 to 2011-04-18.

format tag field
Field: date added
Format string: %_date%

replace with regular expression
Field: date added
RegExp: ^(\d+).(\d+).(\d+)$
Replace with: $3-$2-$1


_date umstellen