I decided to bring the topic back to life.
As I explained in the first post regex engines apart from matching at each character offset (in our one-character example at 'offset 0') also try to match a pattern at an offset behind the last character (i.e. match a pattern against void at the end of the string) (in our case 'offset 1').
Put $regexp(a,.*?,+) in any regex tool and you get +a+
Two zero-width matches (at offsets described above) are replaced with + character. That's 100% correct! Yet, Mp3tag outputs +++
Match 1: a
Match 2: {void at the end of the string}
This regex should produce +a
Explanation: There is only one zero-width (due to laziness of the star) match at 'offset 0'. Match at 'offset 1' fails (due to ^ anchor).
It matches the same way as 1. did. $ anchor doesn't change anything here.
Again, same matches as in 1. and 3.
.*? token will be forced to expand it's match to letter a by $ anchor. After that, there's another match at the end of the string.
Match 1: a
Single one-character long match at 'offset 0'.
$ anchor makes the .*? token expand it's match to cover whole string.
^ anchor assures that pattern cannot be matched after the string ('offset 1').
What we learn from above:
- The problem occurs only when using *? (lazy star) token.
- Using $ anchor prevents the bug from appearing
Also:
- The bug will show up when using character token in regex that matches the character in the string (in our case it's the dot matching 'a')
Other examples:
$regexp(b,b*?,+)
$regexp(9,\d*?,+)
etc.