sayregexp - how to use it?


#1

Hi everyone,

I noticed in some web sources scripts the usage of the sayregexp command with two or three parameters but did not find any further documentation. What is the meaning of the parameters?

Thanks.


Script for ripping genre, year & record label from beatport.com
#2

It's a currently undocumented function which takes 3 parameters:

  • a regular expression that is used on the current line from the current position,
  • a separator string that is used to separate matches,
  • an optional string that marks the end of the inspected region

#3

Still not 100% clear how it works.

Onlinehelp:
SayRegexp S s s
Outputs all matches of the regular expression in the first parameter separated by the string in the second parameter and ignores the line if the string in the third parameter cannot be found.

From the forum:
sayregexp = Say Regex
Line: this text
Example: sayregexp "(?<=)[^<]+(?=)" ", " ""
Result: this text

Some examples from the forum:

SayRegExp "(?<=)[^<]+(?=)" ", " ""
SayRegExp "(?<=<event date=")[-0-9]{4,10}(?="/>)" ", " ""
SayRegExp "(?<=)[^<]+(?=)" ", " ""
SayRegExp "(?<=<release id=")[-0-9a-y]{34,40}(?=" )" ", " ""
SayRegExp "(?<=)[-0-9]{4,10}(?=)" ", "

What I'm wondering about is the (?<=xxx) and (?=yyy) part in the first parameter.

It looks like

SayRegExp "(?<=xxx).+?(?=yyy)"

corresponds with

$regexp(xxxsourcetextyyy,xxx(.+?)yyy,$1)

And this works for tests. Can this system be expanded? Are there ways to use more backreferences, corresponding to $2, $3, ....


#4

Is there a way I can use a linebreak as seperator string in the second parameter?

sayregexp "(?<=start).+?(?=end)" "\r\n" does not work


#5

Maybe try $char(13)$char(10)

DD.20110401.2025.CEST


#6

no, doesn't work also.


#7
  • Hmm ... try ""
    ... might be a joking on April 1st.
  • Hmm ... try \d\a and such other things.
  • Hmm ... replace all CRLF in the file with a character of your own inspiration.

Is it a text file with CRLF line delimiter sequence?
Or maybe only LF line delimiter?
Or maybe only CR line delimiter?

CR means "carriage return" ... $char(13) ... ASCII 13
LF means "line feed"... $char(10) ... ASCII 10

DD.20110401.2148.CEST


#8

haha, you got me. I was already trying around with "" and "" before you edited your post.

I don't understand CR and LF

It's for this here:
Barcode automatisch einf├╝gen

instead of ", " in
sayregexp "(?<=<span class="type">Barcode:).*?(?=)" ", "
i would like to have a linebreak as seperator.

but it looks like the seperator can't be a regular expression or a mp3tag scripting $function.
"\\" would work to create a multivalue field, but I think there is no code which creates a linebreak like \\ creates a "mulitvalue tag field break"


#9

Hmm ... I try to understand ...
... it seems that control characters are not allowed in the "say buffer".
... try TAB ... $char(9) ... ASCII 9
... try PageFeed ... $char(12) ... ASCII 12
... try Bell ... $char(7) ... ASCII 7

DD.20110401.2159.CEST


#10

the $char() function does not work here, gives me the fuction string, not the result of the funtion.


#11

The first construct (?<=xxx) is called a 'zero-width positive lookbehind' in regex terms. A handy explanation can be found here. The second one, (?=yyy), is a called 'zero-width positive lookahead'. Both constructs are standard regexp features. Zero-with means that you can use it to ensure that a given pattern exists in the specified place (before or after the text you're actually about to extract) without showing up in your output.

-u302320


#12

Thank you very much. Very usefull link.