Too complicated to accomplish in Mp3tag?

I'd like to do the following in Mp3tag using regexp for filenames only.

  1. Split (tokenize) on the following characters: ~ ( ) { } [ ]
  2. Ignore leading and trailing spaces for each token, but don't remove them
  3. Always capitalize first and last word unless it's some specific word (tweaker, ohGr, etc)
  4. Always captialize remaining words except for articles (a, an, the), conjunctions (and, or, but) and small prepositions (in, out, on, of, to, at)

In Mp3tag I was picturing creating a new action of type "Replace with regular expression". The action would apply to Field _FILENAME and the Regular Expression would return all the words that needed to be capitalized. "Replace matches with" would simply be the same expression with the first character capitalized.

I'm using The Regulator as a guide to step my way through this process. I'm just wanting to make sure I'm not on a fool's errand. Is something that can be accomplished or is it to invlolved?

Do you have an example for point 1 and 2 ?
I'm sure it can be done, but there's probably more than 1 action needed.

So here's an example filename:

aps ~ Reggae (2005) Matisyahu [Live At Stubb's{01}] Matisyahu ~ Sea To Sea

I want to treat each token as it's own "title"

So if I had the that filename, it would break the following segments

Live At Stubb's

 Sea To Sea

So, it's split on those characters and because of the wonders of html you can't see that there are spaces both leading and trailing that I'd like to ignore, but not remove. As I'm typing this I'm also realizing that there will be an empty token because of the Track # being included in the CD Name brackets. Need to make sure that that's not a problem as well.

So ultimately, I'd like to end up with the filename:

Aps ~ Reggae (2005) Matisyahu [Live at Stubb's{01}] Matisyahu ~ Sea to Sea

Namely, do not remove any of the spaces, and capitalize and lowercase (the first letter of each word only) as appropriate. I understand the naming convention will seem awkward to most, but this is used before doing my Filename To Tag conversion.

If you could help me out dano, that would be great :smiley:

Here are two actions:

First puts everything in "first letter upper case, rest lower case" form. It uses space, ( [ { to define the first letter of a word.

The second action puts your prepositions to lower case if they are surrounded by spaces. (If you want i.e. [At also to become [at the you can add it by using (\s|[)

Then you just need a third action for point 3 to put your special words in your desired spelling

user_beaker.mta (161 Bytes)


For this file: dl ~ Reggae (2005) Matisyahu [Live At Stubb's{01}] Matisyahu ~ Sea To Sea.mp3 I get this error

When I look at the new filename, I see it has capitalized it to: Dl ~ Reggae (2005) Matisyahu [Live At Stubb's{01}] Matisyahu ~ Sea To Sea.mp3

[EDIT]Please don't let my disappointment think I'm not appreciative. Thank you!! :wink:

Ok you have an older Mp3tag version.
the regular expression must be:
(replace \u003d wit =)

That fixed the error :smiley: Thank you!

A few more follow-up questions: Can this be updated to act on the first letter only? Meaning "McLachlan, Sarah" wouldn't get changed to "Mclachlan, Sarah". My big concern is that "Is" and "Be" are capitalized while "in" is not.

Here is an example of why I was thinking looking at phrases and not words (this isn't the real name of the song, just used as an illustration):

aps ~ Alternative (1996) Beck [Odelay{08}] Beck ~ Where It's At (Two Turntables and a Microphone).mp3

This would have the word "At" incorrectly lowercased. I was hoping the action would view a space and any of these characters ()[]{} (or reversed) as the same as being the first or last word. This I think is where the real trouble lies...

It seems like we're almost there. Thanks again for all your help.

I'm just realizing that maybe I'm making this too hard. Basically all of these [SPACE]Someword[SPACE] should be capitalized.
All of these [SPACE]someshortpreposition[SPACE] should be lowercased unless it is preceeded or followed by any of these ()[]{} Would this be easier?

Next one:

I slightly modified it to account for ~)]} \s(a|an|the|and|or|but|in|out|on|of|to|at)(?=\s)(?!\s[~()[]{}])

I also modified the first part to use $caps2 which doesn't lowercase subsequent uppercase letters, so "McLachlan, Sarah" would be renamed to "Mclachlan, Sarah"

The only remaining problem I'm having is if a song/artist/whatever begins with one of these. aps ~ Alternative Metal (2001) Tool [Lateralus{03}] Tool ~ The Patient is being changed to Aps ~ Alternative Metal (2001) Tool [Lateralus{03}] Tool ~ the PatientI see the problem in Regulator, but I'm not sure how to add another conditional in regexp. Checking to make sure it's not ~()[]{}[SPACE] before the word would take care of the problem.

It's hard to belive how much logic these regexp can account for. All the tutorial's I've found online only say what each character means. IE: you need to know what "negative lookahead" means. Is there a site that you recommend?

Ok. I found a good site that explains things a bit better for me. From what I've read, the following should work: (?<![~()[]{}])\s(a|an|the|and|or|but|in|out|on|of|to|at)(?=\s)(?!\s[~()[]{}])This does work in The Regulator. Unfortunately, I'm getting this error again

I can get the check for ~ to work, but not the check for a list of characters :frowning: Any thoughts?

[EDIT]copy/paste error

You could add (?<![~()[]{}]) to the beginning of the regex, but I don't kow if it is supported in your version. is a nice site.


Does it work in your version?

Yes it works. There was an upgrade in the engine some time ago (probably with the new unicode build)

It's the Unicode changes that are scaring me from the new versions. If I don't want to write Unicode (because of my hardware mp3 players) I need to tell it to write ASCII. Unfortunately, if I write ASCII I lose special characters like Æ in Ænema and such. I understand florian's need to implement it, lots of people were asking for it. I'm sure I'm in the minority on this issue, just letting you know my reasoning.

I've come up with a workaround, I stole a page from your book dano :slight_smile: I look for [SPACE]~ and replace it with ¤~ then I replace ~[SPACE] with ~¤. I do the same for all the separators (){}[]. Then I run this regexp: \s(a|an|the|and|or|but|in|out|on|of|to|at)(?=\s)
This of course is the first one you posted (with the fix for my version of mp3tag). After all is said and done, I replace ¤ with a space.

Works like a champ, just more steps. :w00t:

Thanks again for all your help, dano. :smiley: I appreciate your patience.

Nice to have a workaround :slight_smile:

Just FYI, ASCII is not used, but ISO-8859-1
See here for all possible characters:

Æ is included for example

Always good to have a workaround :wink:

Interesting. I should see what (if anything) would change if I were to convert. Thank you for the clarifiaction.

Beaker, I don't suppose you'd be willing to share the full Action Set you ultimately wound up with, would you? I feel like I'm trying to re-invent the wheel here, but I'm not as good at figuring it out as you seem to have been. If you've managed to solve this logic then it sure would be nice to have it made part of the standard actions included with the program (with your blessing, of course).

It probably doesn't help that I'm deeply rooted in older-style RegEx syntax and really miss the < and > (beginning and end of word) operators...

Any help would be much appreciated!

I just sent you an email, but I also wanted to post here that I'd be more than happy to help. Sorry it's taken so long for me to see that there was a reply to this thread. Let me know if you're still needing assistance.

I found this thread after searching around for a while (I realize it's old) but think this is exactly what I'm looking to do. If you still have this info available, I'm definitely interested in what you used as well since it seems to fulfill my needs. :slight_smile: