RegEx and Back References


#1

I am a novice at regular expressions, and MP3Tag is where I am cutting my teeth. I still find that I am wrong many more times than I am right and I don't want to use this forum to teach me how to create them. I am hoping that I have found an issue and it's not my misunderstanding.

I am attempting to parse a Comment field with the text "Before T3 After". However "T3" may be T1, T2, T3, T4, or T5. I figure that in RegEx the way you best describe that is "T[1-5]". Assuming I am correct I would then think that

$RegExp(%Comment%,^(.)T([1-5])(.)$,$1/$2/$3)
would return
"Before /3/ After"
but instead I get
"Before //3 After"
QUESTION #1: Why didn't that example work? Do character classes not return references?

During the next hour as I throw anything I can find at this problem (and there are so many things in RegEx) I try

$RegExp(%Comment%,^(.)T([1-5]+)(.)$,$1/$2/$3)
which I still think should work (but I still might be wrong about that), it returns an error (or two):
REGEXP ERROR: Regular expression
Invalid preceding regular expression
QUESTION #2: Why did that expression return an error?

Finally, I came up with

$RegExp(%Comment%,^(.)T(\d)(.)$,$1/$2/$3)
which works well enough for this time (since there will never be a T0 or T6-T9, but I don't care if there is then matching to it is OK)

QUESTION #3: Is there a better way to do what I want restricting to T[1-5]?


#2

You must write '[' and ']' when you use scripting.


#3

I don't understand.


#4

$RegExp(%Comment%,^(.)T('['1-5']')(.)$,$1/$2/$3)


#5

I am back, and now I believe I might have the opposite problem. This time I need to find the square bracket literals ("[" and ]") in a string (actually isolate what's between square brackets). So, I would think that

  • $regexp(%Title%,^(.)[(.)],$2)
would do the trick, but that just returns an error. I assume it's something funny about the square bracket character again. Is that right?

Also, are their any other differences between vanilla Regular Expressions and the implementation within the $RegExp function?


#6

The problem is that we are using two different languages within the $regexp function - title formatting language and regular expressions. Unfortunately, I have no means of seamless language composition at the moment.

The square brackets are special characters in the title formatting language and you have to quote them with ' if you want them to be displayed. As a consequence the call to the $regexp function should look like this

$regexp(%Title%,^(.*)\'['(.*)\']',$2)

#7

I see how it works! Thanks, that finally makes sense:

'[' starts a character class definition
'[' is the literal open square bracket character

I actually solved the problem using escape sequences (\x5b for '[' and \x5d for ']')
$regexp(%Title%,^(.+)(\x5b)(.+)(\x5d),$3)

Is the square bracket the only issue within the RegExp function?


#8

To my current knowledge: yes.