help with a "found" regex

howdy y'all,

i found this post ...
Feature Request - Title Casing

... containing this regex ...
\s(The|Tha|Da|To|Of|By|For|From|A|An|As|And|At)(?= )

i THINK it matches any of the words between the 1st set of () that are capitalized and have both ONE leading and ONE trailing space.

is that correct? i've currently got a passel of "replace" commands in an action that this would appear to allow me to replace with one line.

take care,
lee

howdy y'all,

found another that confuses me. [grin] this one does "O'Brien" style names.

(^|\s|"|(|[|/)O'([^e]{1})
$1O'$upper($2)

i THINK (^|\s|"|(|[|/) means ...
anything before >O'< that is found at the beginning of field or after a space, a double quote, "(", "[" or "/". ends up in the variable $1.

i don't understand what the ([^e]{1}) part does tho. it looks like "the 1st non-letter-e after the >O'<". the result ends up in variable $2.

help, please?

take care,
lee

Learn from there ...
http://www.regular-expressions.info/charclass.html

DD.20101109.0650.CET

howdy DetlevD,

i followed one of your earlier links to that site. thanks! [grin]

still, i am confused at what ([^e]{1}) does. i THINK it matches the 1st non-letter-e after the >O'<, but i aint sure of that. could you / would you elucidate?

i finally found out about "lookaround" stuff - (?= ) - at that site so i don't need clarification on the 1st post in this thread.

take care,
lee

To check something for correctness, especially a Regular Expression, you should always create some example test cases with positive and negative results, to make sure, that the process works as you think and expect.

Look at the following expression. You can put it into the "Format string" edit field of the "Tag - Filename" dialog and the result is given into the "Preview" area.

$regexp('The Letter A At The Word And Is Often An Error From Da Writing And Should Be Replaced By Tha Letter E','\s(The|Tha|Da|To|Of|By|For|From|A|An|As|And|At)(?=\s)','\L $1')

Result:
The Letter a at the Word and Is Often an Error from da Writing and Should Be Replaced by tha Letter E.

It is obvious what the Regular Expression does and what not.

DD.20101109.1023.CET

Exclude the case "O'er" to become "O'Er".

DD.20101109.1156.CET

howdy DetlevD,

thanks! so i got the ([^e]{1}) part correct. it helped to see what it was testing for. too bad my imagination simply never went that far.

"obvious" is relative. to me, your examples take several run-thrus to figure out and i'm often left wondering if i really got it.

again, thanks for your help!

take care,
lee