Help With a Regular Expression please...


#1

Hi guys,

Could you please help me with a regular expression please? I'm trying to pull the list of Producers from IMDB's combined page? (see below as an example):

http://www.imdb.com/title/tt0126029/combined

I have this so far, but it's not working:

findline "Produced by" 1 1
unspace
if "


"
findinline "Produced by"
outputto "PRODUCERS"
joinuntil ""
sayregexp "/">[^<]+(?=<)" "@@" ""

Thanks in advance folks!


#2

It's working fine but you don't have endif at the end.


#3

edrikk, because I rather hate this sort of crippled webscript language I cannot support much to your request.

Following there is the complete TABLE structure which contains the producer names.
How would you get all the names from the first TD cell from each TR using the webscript language?

<table border="0" cellpadding="1" cellspacing="1">
<tr>
<td colspan="3" align="left"><h5><a class="glossary" name="producers" href="/glossary/P#producer">Produced by</a></h5></td>
</tr>
<tr>
<td valign="top"><a href="/name/nm0254645/">Ted Elliott</a></td>
<td valign="top" nowrap="1"> .... </td>
<td valign="top"><a href="http://www.imdb.com/glossary/C#co-producer">co-producer</a>  </td>
</tr>
<tr>
<td valign="top"><a href="/name/nm0277896/">Penney Finkelman Cox</a></td>
<td valign="top" nowrap="1"> .... </td>
<td valign="top"><a href="http://www.imdb.com/glossary/E#executive_producer">executive producer</a>  </td>
</tr>
<tr> <td valign="top"><a href="/name/nm0367286/">Jane Hartwell</a></td>
<td valign="top" nowrap="1"> .... </td>
<td valign="top"><a href="http://www.imdb.com/glossary/A#assoc_producer">associate producer</a>  </td>
</tr>
<tr>
<td valign="top"><a href="/name/nm0005076/">Jeffrey Katzenberg</a></td>
<td valign="top" nowrap="1"> .... </td>
<td valign="top"><a href="http://www.imdb.com/glossary/P#producer">producer</a>  </td>
</tr>
<tr>
<td valign="top"><a href="/name/nm0513502/">David Lipman</a></td>
<td valign="top" nowrap="1"> .... </td>
<td valign="top">co-executive producer  </td>
</tr>
<tr>
<td valign="top"><a href="/name/nm0704968/">Sandra Rabins</a></td>
<td valign="top" nowrap="1"> .... </td>
<td valign="top"><a href="http://www.imdb.com/glossary/E#executive_producer">executive producer</a>  </td>
</tr>
<tr>
<td valign="top"><a href="/name/nm0744429/">Terry Rossio</a></td>
<td valign="top" nowrap="1"> .... </td>
<td valign="top"><a href="http://www.imdb.com/glossary/C#co-producer">co-producer</a>  </td>
</tr>
<tr>
<td valign="top"><a href="/name/nm0912403/">Aron Warner</a></td>
<td valign="top" nowrap="1"> .... </td>
<td valign="top"><a href="http://www.imdb.com/glossary/P#producer">producer</a>  </td>
</tr>
<tr>
<td valign="top"><a href="/name/nm0930964/">John H. Williams</a></td>
<td valign="top" nowrap="1"> .... </td>
<td valign="top"><a href="http://www.imdb.com/glossary/P#producer">producer</a>  </td>
</tr>
<tr>
<td valign="top"><a href="/name/nm1306049/">Linda Olszewski</a></td>
<td valign="top" nowrap="1"> .... </td>
<td valign="top">assistant producer (uncredited) </td>
</tr>
<tr>
<td valign="top"><a href="/name/nm0000229/">Steven Spielberg</a></td>
<td valign="top" nowrap="1"> .... </td>
<td valign="top"><a href="http://www.imdb.com/glossary/E#executive_producer">executive producer</a> (uncredited) </td>
</tr>
<tr>
<td colspan="4">&nbsp;</td>
</tr>
</table>

DD.20100831.2215.CEST


#4

Thanks Dano,
Sorry, the endif was missed in a copy-and-paste.

It works fine, EXCEPT each producer has an extra /"> before their name.

/">Ted Elliott@@/">Penney Finkelman Cox@@/">Jane Hartwell@@/">Jeffrey Katzenberg@@/">David Lipman@@/">Sandra Rabins@@/">Terry Rossio@@/">Aron Warner@@/">John H. Williams@@/">Linda Olszewski@@/">Steven Spielberg

My error is in the first portion of my regular expression (bolded below), for which I'm seeking help:

sayregexp "/">[^<]+(?=<)" "@@" ""


#5

I seem to have fixed it...

(?<=/">)[^<]+(?=<)


#6

This should work also to detect all html tags ...
</?([a-zA-Z][a-zA-Z0-9])[^>]>

DD.20100831.2321.CEST