Using $regexp() to isolate parts of a large tag, like a cuesheet

MacOS 26.3.1, MP3tag 1.11.0, Patterns 1.3

Forgive me, all my scripts got lost when my previous comp died last month, and I have to relearn a lot of what I wrote.

General question: Is there a site or app where I can test regexes in such a way that they “translate” properly to MP3tag? I use Patterns with it set to “Perl (PCRE)”, but I’m unsure if that matches what’s stated on the documentation.

Specific Question: I’m trying to isolate and extract info from Cue sheets, and just cannot remember how I got there for my old script. When I try to grab INDEXes with Patterns, it works fine:

and if I were to try and “translate” that to MP3tag, I’d want it to read more like:

^.+?\[\r\n\] TRACK %TRACK% AUDIO.+?((?:\[\r\n\] INDEX \d\d \d\d:\d\d:\d\d)+)(?:.+|$)

But when I try something like:

$regexp(%CUESHEET%,'^.+?\[\r\n\] TRACK '%TRACK%' AUDIO.+?((?:\[\r\n\] INDEX \d\d \d\d:\d\d:\d\d)+)(?:.+|$)',$1)

It doesn’t work, and I don’t know where to start with troubleshooting, because what I’m trying to do works with Patterns. I’m guessing this may be a mistake in my approach.

… thank you kindly in advance for your time, and getting this far. I had a really wonderful action groups .json file with all of these written out, thanks to the great update however long ago, and *poof*, one of the few things I lost and it’s one of the more frustrating.

Peace 8^)

I think you can test the regular expression in Convert>Tag-Tag and its Preview

Try out regex101.com, it’s great for breaking down each step of the process and is pretty close to boost.regex/ICU in its default settings. I think as long as you remember to remove/put back those ' around the pattern and escape any characters that conflict between regex and format strings, it’s a good enough workflow.

You can test out your $regexp() in a Column for testing but there might be slight differences in how Columns and Convert/Actions process tags so definitely try for real on a live example once you’ve got a formula looking good.

Not sure where to start with a cuesheet, sorry. :sweat_smile: Have you got a sample bit of text we could use so we can see how your formula reacts?

Thank you kindly for the responses 8^)

Those are good ideas, thank you. I used to have a test column up, but never thought to use it for regexes.

My comp exploded last month so I’m literally starting from scratch, however my old script extracted the INDEXes and then built a brand new cuesheet to my specs. That way I can also use Cuesheets to import my preferred tags, in case I have to rebuild a library due to whatever reason.

So with the example cuesheet I have in the picture:

REM ARTIST "Example"
FILE "Example.flac" WAVE
  TRACK 01 AUDIO
    TITLE "One"
    INDEX 01 00:00:00
  TRACK 02 AUDIO
    TITLE "Two"
    INDEX 00 02:54:60
    INDEX 01 02:57:57
  TRACK 03 AUDIO
    TITLE "Three"
    INDEX 00 05:41:67
    INDEX 01 05:45:05

What I tried was:

Format Tag Field

Field: CUESHEET2
Format: $regexp(%CUESHEET%,'^.+?\[\r\n\] TRACK '%TRACK%' AUDIO.+?((?:\[\r\n\] INDEX \d\d \d\d:\d\d:\d\d)+)(?:.+|$)',$1)

It didn’t work.

I’ll try regex101 again as well, thank you. A few decades ago I learned via JGSoft’s EditPadPro, but not having that program is the punishment for my transgression of switching to Macs, I guess 8^)

My preview does show the original. Which is an indication that the defined pattern does not match the data.
What may be a source for problems: the cuesheet shows TRACK 01 and you use TRACK %track%.
Does %track% really have a 2-digit-number?

E.g. this expression
$regexp(%cuesheet%,'^.+?\s*TRACK 03 AUDIO.+?((?:\s*INDEX \d\d \d\d:\d\d:\d\d)+)(?:.+|$)',$1)
Produces

    INDEX 00 05:41:67
    INDEX 01 05:45:05"

For me, yes. I normally use %BASETRACK% (min 2 digits, building %TRACK% from that, %SUBTRACK%, AND %MEDIASIDE%), but I didn’t want my tagging to further confuse anyone looking at this.

I made some amendments between both of your patterns:

$regexp(%cuesheet%,'^.*?\s*TRACK '$num(%track%,2)' AUDIO\s.*?[\n]((?:\s*INDEX \d\d \d\d:\d\d:\d\d)+).*',$1)

The apostrophes in ...%track%'...' splitting up the regex into 3 sections are back to allow %track% to be used, $num() is just a guarantee that a single-digit %track%/`%basetrack% will have a leading zero.

I had a lot of bother getting it to work on regex101 as it turns out that:

...AUDIO\s.+?[\n]((?...

was needed as Mp3Tag was passing newlines through .+? appropriately but not regex101 or a fair amount of other testing sites. That could be why Patterns was giving you grief too. This should also remove the extra line above your result. Both .*? and .+? work the same at any rate.

You can always use another regex afterwards to remove the leading spaces, if required.

… on a slightly off-topic note, I have no idea why regex101 outputs the remaining text without the match/group when not substituted with $1, yet Mp3Tag actually outputs the matched group alone. :smiling_face_with_tear: Hoping someone could enlighten me as to the differences in how outputs are dealt with. Back on topic, another thing to look out for when testing. :wink:

So, interesting development. I tested this out a few more times in a few more ways:

And it just didn’t work. Then I started testing with new-line turned off on Patterns, and this worked on MP3tag:

$regexp(%CUESHEET%,'^(?:.|\s)+?TRACK '$num(%TRACK%,2)' AUDIO(?:.|\s)+?((?:\s*INDEX \d\d \d\d:\d\d:\d\d)+)(?:.|\s)*$',$1)

So then while I was getting there, I realized why I was having so much trouble. I remembered my last script, which took all \n\r and replaced it with #!#. It made traversing the code easier for me at that time. Funny enough, this new one seems better.

I'm going to keep going, but thank you for your help 8^)

I just checked both formulas in iOS Shortcuts and you're right, didn't work :face_with_bags_under_eyes: Must be a difference within ICU regex that I missed, apologies. Yours is working great, only other amendment is:

$regexp(%CUESHEET%,'^(?:.|\s)+?TRACK '$num(%TRACK%,2)' AUDIO(?:.|\s)+?\n((?:\s*INDEX \d\d \d\d:\d\d:\d\d)+)(?:.|\s)*$',$1)

I put the \n back which should remove the top newline for you to leave only the lines you need.

Sorry, back again! :grimacing:

I found out ICU is stricter about single/multi-line matching but you can use (?s) to fix that:

$regexp(%CUESHEET%,'(?s)^.*?TRACK '$num(%TRACK%,2)' AUDIO.*?\n((?:\s*INDEX \d{2} \d{2}:\d{2}:\d{2})+).*',$1)

That should hopefully allow the previous formula to work in both boost.regex and ICU.

I tested using ^(?:.|\s)+ further to find it was suffering from catastrophic backtracking which shouldn't be an issue as long as TRACK matches a value in your CUESHEET. When there wasn't a match, it was making my Shortcuts app hang for a while :face_with_open_eyes_and_hand_over_mouth:

Good timing, I was just coming back here. First off:

So, this has directly taught me through context the differences between \s and \n, and that's a big help, thank you. Ironically, I use that top newline in my script, haha. But for this thread's purposes, that works great and I can use the logic for later.

Excellent, thank you kindly.

Quick question. \d\d and \d{2} are equivalent, right? I only ask because I start using {} at 3, because \d\d is one char shorter.

... I noticed that, too, unfortunately. That's been the case with my cuesheet script from the start, that MP3tag not lagging to a halt is contingent on the cuesheets being formatted correctly, *and* me keeping to my personal tagging standard. I think that's due to knowing regex well enough to play with it, but not optimize it. So things like (?s) never would've crossed my mind (I needed the gui checkbox!).

Thank you once again, kindly.

Yep, \d{3} would look for three instances of \d and \d{0,2} would look for between 0-2 instances. But you’re right, \d\d is less characters which is an optimisation nonetheless :face_with_tongue:

No problem, it was fun to work on!

(and making me despairingly wary that my other scripts might need a revamp to work with ICU :melting_face: )