Regex look ahead/behind


#1

Hello. I'm trying to modify a regex script to add an exception. Currently, the script is designed to add a space after a set of defined punctuation marks if it immediately precedes another alphanumeric character (e.g. "There,There" converts to "There, There"). Here's the script:

Action type: Replace with regular expression
Field: _TAG
Regular expression: ([&)}];,+])([^\s.:;([/])
Replace matches with: $1 $2
[ ] case-sensitive comparison

The problem with this script is that large numbers become fragmented. For instance, "10,000" gains an unwanted space after the comma, thus I end up with "10, 000". My initial idea was to use a negative lookahead and lookbehind for numbers (i.e. if there is a number before the comma, AND three numbers after it, then do not add a space). I had something like this in mind: (?<!\d),(?!\d{3})

Now, as you cannot use groups within a character class, I would have to change the regex to this:

Action type: Replace with regular expression
Field: _TAG
Regular expression: (&|)|}|]|;|(?<!\d),(?!\d{3})|+)([^\s.:;([/])
Replace matches with: $1 $2
[ ] case-sensitive comparison

Unfortunately however, for reasons that I don't quite understand, I cannot get this to work. Using just the lookahead, or just the lookbehind woks, but not both together. I'm stumped on this one. I'm sure there's a simple solution to this, but I just can't see it. Does anyone here know how to fix this? Cheers.


#2

Although it is not the High School of Look Over and Look Aside, but it seems to work, what do you think?

$regexp($replace('Huma8028,1234,567,3006,Test,4711,0',',',', '),'\b(\d+),\s(\d+)\b','$1,$2')

From:
'Huma8028,1234,567,3006,Test,4711,0'
To:
'Huma8028, 1234,567, 3006, Test, 4711,0'

DD.20110501.2045.CEST


#3

The problem with correcting after the fact is that, as with your solution, numbers with more than one comma (e.g. 1,000,000) still gets split. It's much safer to just add an exception. Thanks for trying, though.


#4

OK, I've figured it out. I simply split the lookahead and lookbehind with a pipe operator. :rolleyes:

(&|)|}|]|;|(?<!\d),|,(?!\d{3})|+)([^\s.:;([/])