Extract multiple IDs from different URLs with regexp


#1

Hi,

I'm new here and first i want to say thanks to the developer for this great program and the forum members for the awesome support.

Now here is my problem:

Over the years I have stored two species of URLs among others in the %www% field of my mp3 collection.

For Example it looks like:

http://www.mute.com/ http://www.allmusic.com/cg/amg.dll?p=amg&sql=10:0cfqxq80ldde http://www.allmusic.com/cg/amg.dll?p=amg&sql=10:3ifoxzysldde http://coverparadise.to/index.php?Module=ViewEntry&ID=7735 http://www.allmusic.com/cg/amg.dll?p=amg&sql=10:gcfqxq9jld0e http://coverparadise.to/index.php?Module=ViewEntry&ID=69885 http://www.emimusic.de

My goal is to extract all IDs from the URLs to user defined fields with multiple values like

Field: ALLMUSICID
Value: 0cfqxq80ldde\\\\3ifoxzysldde\\\\gcfqxq9jld0e

Field: COVERPARADISEID
Value: 7735\\\\69885

My first try to solve this problem was the following regular expressions combined with functions:

$replace($trim($regexp($regexp(%www%,'http://www\.allmusic\.com/cg/amg\.dll\?p=amg&sql=10:\w+\s?',),'http://coverparadise\.to/index\.php\?Module=ViewEntry&ID=',)), ,\\\\)

This is for the field COVERPARADISEID. It removes first all instances of the allmusic URL including ID. Then it deletes all instances of the coverparadise URL not including ID. After that it trims space and finally all spaces were replaced by \\.

For the field ALLMUSICID analog:

$replace($trim($regexp($regexp(%www%,'http://coverparadise\.to/index\.php\?Module=ViewEntry&ID=\d+\s?',),'http://www\.allmusic\.com/cg/amg\.dll\?p=amg&sql=10:',)), ,\\\\)

The disadvantage of this method is that this only works corrctly if only allmusic and coverparadise urls are stored in the %www% field. In the case of the existence of another type of urls these were added to the user definded fields:

Field: ALLMUSICID
Value: http://www.mute.com/\\\\0cfqxq80ldde\\\\3ifoxzysldde\\\\gcfqxq9jld0e\\\\http://www.emimusic.de

Field: COVERPARADISEID
Value: http://www.mute.com/\\\\7735\\\\69885\\\\http://www.emimusic.de

A better sollution would be a expression that matches every instance of a species of urls regardless of their position and returns the IDs separated by space or \\. So I tried again and found this one for the allmusic field:

$regexp(%www%,'((http://www\.allmusic\.com/cg/amg\.dll\?p=amg&sql=10:)(\w+)\s?)+\s?.*','$3')

This expression matches only the ID of the first instance of the adress and leaves the following ones unconsidered.

Field: ALLMUSICID
Value: 0cfqxq80ldde

If someone could help me to figure out how to modify the expression that it matches all IDs from allmusic (or coverparadise) would be great.

Kind regards

Knabbakeks


Genre bereinigen
Interpreten schreibweise ueberpruefen
#2

Hi,

I haven't played with mp3tag RegEx incarnation and assumes you may be storing more than one URL in a field;

http:.*(&sql=10:|&ID=)(\w+)

You could, of course, shorten the expression, but it is more readable to me.

Daz


#3
QUOTE (Knabbakeks @ Sep 23 2010, 15:37) <{POST_SNAPBACK}>
... My goal is to extract all IDs from the URLs to user defined fields with multiple values like
Field: ALLMUSICID
Value: 0cfqxq80ldde\\\\3ifoxzysldde\\\\gcfqxq9jld0e
Field: COVERPARADISEID
Value: 7735\\\\69885

...
This is for the field COVERPARADISEID. It removes first all instances of the allmusic URL including ID. Then it deletes all instances of the coverparadise URL not including ID. After that it trims space and finally all spaces were replaced by \\.
...


So far there are some question.

What happens with such a tag-field ...
Field: COVERPARADISEID
Value: 7735\\69885
... when the file has been saved?
Is this what you want?

Do you know whether it is possible to create a tag-field by using the text content from another tag-field?
Example:
There is a tag-field ...
FIELD1="coverparadise"
... which can be modified to ...
FIELD1=$upper(%FIELD1%)'ID'
... giving ...
FIELD1=COVERPARADISEID
There is a tag-field ...
FIELD2="7735".

The goal is to create a new tag-field this way ...
Field: %FIELD1%
Value: %FIELD2%

If both questions can be answered to satisfaction, then there might be a way for automation to solve your request.
Otherwise there has to be build a sort of lookup table (manually to serve) in order to provide a relation between URL and tag-field name. Once such a lookup is installed, it should be possible to write mutiple values into the corresponding ID tag-field.

But this all needs full understanding and tricky usage of the Mp3tag features.
Mainly it is a challenge because of the lack of loop over an item list within a tag-field and other missing item related string functions.

Really I suggest, that you export your WWW field content into a text file, let the "dictionary work" (multiple values to one key) be done by another "full blown programming language" (or simply use a text editor, which at best provide macro operating) and import the result back into the file.
But even this approach has the need of manually "hard coding" the receiving tag field names (like "COVERPARADISEID") into a Formatstring when importing from "Textfile to Tag", and the import process has the need of human interaction too.

DD.20100924.1621.CEST


#4

Thanks for your attention and reply!

The field becomes a tag-field with multiple values. This is exactly what I want. First I aim to clean up and shorten the WWW field. Second I intend to use this for tools to browse all IDs with one klick. I changed the fieldnames ALLMUSICID to ALLMUSIC_ID and COVERPARADISEID to COVERPARADISE_ID.

this is the tool for Coverparadise:

$if(%coverparadise_id%,"$replace($trim($replace( $meta_sep(coverparadise_id, ), , http://coverparadise.to/?Module=ViewEntry&ID=)), ," ")",http://coverparadise.to/index.php?Module=ExtendedSearch&SearchString=$replace($if2(%band%,%artist%) $regexp(%album%,'\s*(\(|\[|\{).+?(\)|\]|\})',), ,+,&,%%26))

and this is for Allmusic:

$if(%allmusic_id%,"$replace($trim($replace( $meta_sep(allmusic_id, ), , http://www.allmusic.com/cg/amg.dll?p=amg&sql=10:)), ," ")",http://www.allmusic.com/cg/amg.dll?p=amg&opt1=2&sql=$replace($regexp($regexp(%album%,'\s*(\(|\[|\{).+?(\)|\]|\})',),'^(The|A|An)\s(.*)$',$2), ,+,&,%%26))

I use this already for Amazon:

$if(%asin%,"$replace($trim($replace( $meta_sep(asin, ), , http://www.amazon.$if($neql($regexp(%country%,(?i)germany,),%country%),de,com)/exec/obidos/ASIN/)), ," ")",http://www.amazon.$if($neql($regexp(%country%,(?i)germany,),%country%),de,com)/gp/search?ie=UTF8&index=music&keywords=$replace($if2(%band%,%artist%) \"$regexp(%album%,'\s*(\(|\[|\{).+?(\)|\]|\})',), ,+,&,%%26)\")

I have figured this out. It's possible with "Format Values" ("Tag-Felder formatieren" in German). But I don' know why or how this is required for automation.

I guess this is also requiered for a full automation. I don't think this really necessary for my needs. I can manually filter the relating files and then apply it manually in a custom column to control the output first with a preview. After that I can wrtite an action to do the work.

This is an alternate Solution for the problem but I think this is to extensive. Why using a third party tool if mptag can do the work? In the meanwhile I have figured out an expression that do exactly what I want.

Yes I've shorten it a little but it's still a little bit long. Here it is:

Allmusic_ID:

$replace($trim($regexp($regexp(%www%,\r\n, ),'(?<=amg&sql=10:)(\w+\s?)\s*|.+?\s?',$1)), ,\\\\)

Coverparadise_ID:

$replace($trim($regexp($regexp(%www%,\r\n, ),'(?<=ViewEntry&ID=)(\d+\s?)\s*|.+?\s?',$1)), ,\\\\)

The first regexp changes linebreaks to spaces and the second extracts the IDs from the URLs. After trimming and replacing spaces to \\ the values can be written to the desired fields Allmusic_ID and Coverparadise_ID with a custom column or an action.

Surely there is room for improvement. Certanly it's possible to find only one expression without the trim funktion that can do the job. Also it might be possible to generate the fieldnames automaticly from the URLs. Further more this could be combined with an action that automaticly removes the related URLs from the WWW field. I'm doing this still manual. I know its not perfect, but it works for all my tags I want to change.

Greetings!

Knabbakeks


#5

Ok, i was not quite sure, if you was sure about the meaning of two backslashes as a multi value delimiter surrogate.

Ok.

Mp3tag cannot full automate the process you need. You have to do the main work still by your hands and eyes.

I think it is not possible to create a new tag-field from a data value out of another tag-field.
That is the caveat which breaks a possible automation process.

In my study of your problem I got these intermediate results. ...
Input field with items ordered by line (blank replaced with newline) ...

http://coverparadise.to/index.php?Module=ViewEntry&ID=69885
http://coverparadise.to/index.php?Module=ViewEntry&ID=7735
http://www.allmusic.com/cg/amg.dll?p=amg&sql=10:0cfqxq80ldde
http://www.allmusic.com/cg/amg.dll?p=amg&sql=10:3ifoxzysldde
http://www.allmusic.com/cg/amg.dll?p=amg&sql=10:gcfqxq9jld0e
http://www.emimusic.de
http://www.mute.com/

^http://(?:www.)?(.+?)..+?.+$

coverparadise
coverparadise
allmusic
allmusic
allmusic

^?(?:?)(?:.+[=:])(.+)$

69885
7735
0cfqxq80ldde
3ifoxzysldde
gcfqxq9jld0e

That is the pure data which is needed in the process: the tag-field names and the values. Now the challenge is to order the n:m relation of multiple values to unique field names.
But I could not find a tool in the Mp3tag scripting language to handle this task.

DD.20100927.0634.CEST