Help with Long RegEx

I used a lengthy RegEx to find all of the permutations of the word CLIMATE in the New York Times Crossword database. It works, but may be too long:

88 results for regular expression ([CLIMATE])(?!\1)([CLIMATE])(?!\1|\2)([CLIMATE])(?!\1|\2|\3)([CLIMATE])(?!\1|\2|\3|\4)([CLIMATE])(?!\1|\2|\3|\4|\5)([CLIMATE])(?!\1|\2|\3|\4|\5|\6)[CLIMATE]


I believe the problem is that you can back reference the result of a character class match, but not its terms. I'd love to be proven wrong, BTW. You can see the entire result here:

Note: In a regular expression the term [CLIMATE] is a set of seven letters.
A permutation is a transposition of a given ordered set of elements, in order to create a different ordered set of elements, to give the new order another semantically sense.

If there is given a word of 7 letters, then the new word has 7 letters too, but distributed in a different order of the given letters.


I'm not just looking for seven-letter words, but also seven-letter sequences in longer words. My RegEx does that. I've tried to shorten it thiis way with subroutine calls, but I can't get it to work:

Error: parsing "([CLIMATE])(?!\1)((?1))(?!\1|\2)((?1))(?!\1|\2|\3)((?1))(?!\1|\2|\3|\4)((?1))(?!\1|\2|\3|\4|\5)((?1))(?!\1|\2|\3|\4|\5|\6)(?1)" - Unrecognized grouping construct.

Do the (?1) subroutine calls not support another set of parens so that their results can be back-referenced?

If you look for a string constant, why not use a simple search?
If you do not use "word only", you should find all entries that are or contain the string constant.

At the risk that I have understood your problem wrong, ...
if you have a list of words and you want to know, whether one word contains a predefined set of letters, ...
then you may do something like this:

  • get a word from the list of words (dictionary database);
  • remove all the letters from the word, which are given by the predefined set of letters;
  • measure the length of the resulting word.
    If the length is shorter than the unchanged word, ...
    at least by the given number of predefined letters, ...
    then put this word to the result list.


There's nothing simple about it. The search is for anagrams of CLIMATE within words of 7+ length. As you can see, I have a working RegEx in the original post that I just want to shorten. I've tried (?1) and \g<1> as subroutine calls without success.

That won't work. The seven letters of CLIMATE must be consecutive within a word with no repeats or intervening letters. As I said the long RegEx works. Note that you will find an anagram of CLIMATE within each hit if not the word itself.

Ok, as you said, your solution works, then use it.
What is the benefit of all this effort?
Is there any prize money?


Assuming some regexp dialects do not support recursion, ...
maybe there is a way to recode the recursive expression into a linear expression ...
see there ...


I was on that page yesterday trying to solve the problem. No, there is no prize money for shortening the working RegEx. I'm just frustrated that I can't eliminate the multiple occurrences of the [CLIMATE] search character class. I'll have to find out exactly which dialect of RegEx the site is using.

I found out why my shortened RegEx won't work. The site uses Microsoft's .NET which is considered less full-featured that the more standard PERL, PCRE or PHP. .NET doesn't support what I am trying to do at all.

I'm hoping Mp3Tag uses one of PERL-based versions. Does anyone here know exactly which version is used?

see this thread: /t/6109/1

Doesn't tell me much about the Perl version, but I assume that Florian keeps it up to date. My failed shortened RegEx works fine on this PHP-based website that the failed .NET site uses as a tutorial!



And yes, it also works in Mp3Tag!