Unexpected regexp behaviour in "Format value" action


#1

I have got a file named test.mp3. My action group consist of a single "Format Value" action:

Field: _FILENAME
Format String: $regexp(%_filename%,(.*),$1 oops).mp3

After executing the action !once!, my file is renamed to
test oops oops.mp3

Bug? Feature? What's happening here?
-u302320


#2

Even more fun ...

_filename <== 'test'

$regexp(%_filename%,'.',' oops') ==> ' oops oops oops oops'
... this seems to be ok.

$regexp(%_filename%,'()',' - - - ') ==> ' - - - t - - - e - - - s - - - t - - - '
... but here I have a problem to understand.

DD.20110426.1805.CEST


#3

A workaround seems to be

Field: _FILENAME
Format String: $regexp(%_filename%,(.+),$1 oops).mp3

the + enforces that the part inside the subexpression has at least one character, thereby supressing the second, zero-width match.


#4

This is not a bug. This is one of the quirks of regex engines. I've mentioned it before in another bug report.

$regexp(test,(.*),$1 oops)

  1. .* matches the whole string (start of the match at offset 0).
  2. Match ("test") is captured into first backreference.
  3. Match is replaced with $1 oops so that "test" becomes "test oops".
  4. .* matches void after the string (start of match at offset 1, zero-width match). Star makes the dot optional so that the pattern can match here!
  5. Match ("") is captured into first backreference. That's right! The first backreference exists and simply holds nothingness.
  6. Match is replaced with '$1 oops' so that "" becomes " oops"

After all, our input string "test" becomes "test oops oops".