Strange regex engine behavior with a lazy star token

I decided to bring the topic back to life.

As I explained in the first post regex engines apart from matching at each character offset (in our one-character example at 'offset 0') also try to match a pattern at an offset behind the last character (i.e. match a pattern against void at the end of the string) (in our case 'offset 1').

Put $regexp(a,.*?,+) in any regex tool and you get +a+

Two zero-width matches (at offsets described above) are replaced with + character. That's 100% correct! Yet, Mp3tag outputs +++

Correct!

Match 1: a
Match 2: {void at the end of the string}

Wrong!

This regex should produce +a

Explanation: There is only one zero-width (due to laziness of the star) match at 'offset 0'. Match at 'offset 1' fails (due to ^ anchor).

Correct!

It matches the same way as 1. did. $ anchor doesn't change anything here.

Correct!

Again, same matches as in 1. and 3.
.*? token will be forced to expand it's match to letter a by $ anchor. After that, there's another match at the end of the string.

Correct!

Match 1: a

Single one-character long match at 'offset 0'.
$ anchor makes the .*? token expand it's match to cover whole string.
^ anchor assures that pattern cannot be matched after the string ('offset 1').

What we learn from above:

  • The problem occurs only when using *? (lazy star) token.
  • Using $ anchor prevents the bug from appearing

Also:

  • The bug will show up when using character token in regex that matches the character in the string (in our case it's the dot matching 'a')

Other examples:
$regexp(b,b*?,+)
$regexp(9,\d*?,+)
etc.