Help to create an action to remove all advertising web-links


#1

Hello, I am trying to remove all http, www, .com/.org/.net etc from all of my tracks. I have tried to create an action with wildcards but I think I am missing some information. :rolleyes: The addresses located randomly int the ID tag fields. So I tried Replace with wildcards but couldn't succeeded. Any help appreciated.

Regards


#2

There is no action for replacing with wildcards. The only one that comes close is "Replace with regular expressions" ...
Load all the files that you want to modify into mp3tag.

If you are not sure in which field you find the unwanted string you could first of all apply a filter.
Press F3 to show the filter input box at the bottom at the window.
Enter
www
into as filter.
This filter looks at all fields whether it contains www.
If you want to narrow it down to ID then enter
ID HAS www

Select all the files (or just one if you want to be careful)
Press Alt-T to open the extended tags dialogue.
Check if ID contains www.
Select the field and press the delete button to remove it completely.
If you select more than one file and the contents of ID is not the same you will see as default entry. If you want to get rid of the field anyway, delete it.

For a replace action you would need to analyze more deeply what comes before and after the URL.


#3

I am trying to apply couple of variants for replace expression where it finds http:// and another one with www. I am trying the commands for RE. I did this "^http://|\b.com/" but now I need to find out how to remove the remaining. This command did http://blah.blogspot.com/ to blah.blogspot I think I may find some solution to ease up my work. Thanks.

Regards


#4

You are not mentioning if the fields have only the URL or if the URL stands after proper tag values.
Are the URLs ending after .com, .org, etc, or are those longer URLs?

Maybe that helps:

Action: Replace with regular expression
Field: _ALL
Regular Expression: (http://|www.).+?.(com|org|net)
Replace Matches with:


#5

ok did it ^http://.*|\b.com/ this one removes on all fields any http related info I think replacing http with www and com with net org etc will do the trick. :slight_smile:


#6

As a starting point you can learn from there ...
http://regexlib.com/REDetails.aspx?regexp_id=96
http://regexlib.com/REDetails.aspx?regexp_id=1051

DD.20120531.1553.CEST


#7

Here's the one I use *NOTE I created separate ones for TITLE, ALBUM, ARTIST etc. DO NOT use on _TAG beacause it will strip your WWW field.

Its quite tricky looking but it works for most urls I can think off :slight_smile:

It replaces:
DUMMY www.youtube.com DUMMY www.youtube.ie

With:
DUMMY DUMMY

It replaces:
DUMMY www.youtube.com

With:
DUMMY

Action type: Replace with regular expression
Field: TITLE
Regular expression: (?i)\s*(http:|[Www.|(www.|www.)[^\s]+.([a-z]{1,4}|[a-z]{1,4}))(\s*)
Replace matches with: $3

[ ] case-sensitive comparison

#8

wow :w00t: that is a nice string of commands. Thanks a lot everybody for your help.
I was previously doing replace only with the url addresses it was consuming too much time but this is very nice. Thanks again.


#9

I've actually took time to improve it slightly. It has a funtion to turn case sensitivity on and off at certain parts to make sure it doesn't pick up the part www.youtube.comDance in particular. The regexp stops when it see's a capital letter such as 'D' in this case.

Changes:
DUMMY www.youtube.com DUMMY www.youtube.ie DUMMY
To:
DUMMY DUMMY DUMMY

Changes:
DUMMYwww.youtube.comDUMMYwww.youtube.ieDUMMY
To:
DUMMY DUMMY DUMMY

Changes:
DUMMY www.youtube.com
To:
DUMMY_ (_ represents a whitespace character)

Changes:
www.youtube.com DUMMY
To:
DUMMY ( represents a whitespace character)

SO HERE IS THE IMPROVED VERSION

Action type: Replace with regular expression
Field: TITLE
Regular expression: (?i)(\s*)(\s*|[|()(http:|www.|www.)(?-i)[^\sA-Z]+.([a-z]{1,4})(\s*|[|()(\s*)
Replace matches with: _ (Again _ represents one whitespace character)

[ ] case-sensitive comparison

If you find you have unwanted trailing/leading spaces which cannot be avoided using this action in some cases then place the below action at the very bottom of your action groups so it runs last on the list.

Action type: Replace with regular expression
Field: _TAG
Regular expression: ^[\s\t]+|[\s\t]+$
Replace matches with:

[ ] case-sensitive comparison

OR any double spaces, which my URL regexp one does not leave but it might be handy for your other needs. Again DO NOT use this on _TAG as it will get rid of any mutli-lines you may have in UNSYNCEDLYRICS and so on.

Action type: Replace with regular expression
Field: TITLE
Regular expression: \s{2,}
Replace matches with:

[ ] case-sensitive comparison

Here is an example action >>>>>> _Script_Test_TEST.mta (232 Bytes)

_Script_Test_TEST.mta (232 Bytes)


#10

Very impressive :sunglasses: I will give it a try thank you again :smiley:


#11

Thanks for the code! I was just wondering if you could edit the code the to include websites like test.com and www.test.com/test

Edit: www.Test.com doesn't get deleted and (www.test.com) leaves behind ")" :frowning: How would you fix it?


#12

There was a slight fault with the regexp used which is now fixed:

I've placed the fixed regexp along with the regexp for removing unwanted whitespaces into a 'Format' action to fix all in one go.

Action #1:
Action type: Format value
Field: TITLE
Formatstring: $regexp($regexp($regexp(%title%,'(?i)(\s*)(\s*|[|()(http:|www.|www.)(?-i)[^\sA-Z]+.([a-z]{1,4})(\s*)[^\sA-Z]+(\s*)', ),'^[\s\t]+|[\s\t]+$',),'\s{2,}', )

This changes:
Dummywww.test.com
Dummywww.test.comDummy
www.test.com/testDummy
www.test.com/test/Dummy
www.test.com/test/test/testDummy

To:
Dummy
Dummy Dummy
Dummy
Dummy
Dummy

& Changes:
DUMMY www.youtube.com DUMMY www.youtube.ie DUMMY
DUMMYwww.youtube.comDUMMYwww.youtube.ieDUMMY
DUMMY www.youtube.com
www.youtube.com DUMMY
DUMMY www.youtube.com
(www.youtube.com)DUMMY
DUMMY[www.youtube.com]DUMMY

To:
DUMMY DUMMY DUMMY
DUMMY DUMMY DUMMY
DUMMY
DUMMY
DUMMY
DUMMY
DUMMY DUMMY

Action #2:
Action type: Format value
Field: TITLE
Formatstring: $regexp($regexp($regexp(%title%,'(?-i)\s+[^\sA-Z]+.([a-z]{1,4})(]|)|\s*)(\s*)', ),'^[\s\t]+|[\s\t]+$',),'\s{2,}', )

Changes:
Dummy test.comDummy
Dummy test.com Dummy
Dummytest.com Dummy
Dummytest.comDummy
DUMMYtest.comDummy

To:
Dummy Dummy
Dummy Dummy
Dummytest.com Dummy
Dummytest.comDummy
DUMMYtest.comDummy

Please note that action #2 ignores anything without a space before 'test' Its just for safety. You can however once you run this group action ATTACHED HERE >>> _Script_Test_TEST.mta (360 Bytes)

Hit F3 and paste %title% HAS .com or any other URL you may have instead OR %title% MATCHES (?-i).\l which will find a '.' followed by a lowercase character.

Hope this helps :wink:

_Script_Test_TEST.mta (360 Bytes)


#13

stevehero, Thanks for the help. Can I use the _All tag with your code? I have websites in every field including track number lol, so it'll take too long if I fill out 2 or 3 codes for each field. P.s. a lot of the websites are DJWhatever.com so how do you change the code to include it? Thanks, p.s. do you know any good guides to learn regex, I just found out about it.


#14

Use this but ONLY AFTER you run the first two actions I gave above. Also use the filter as I've described before and run on a dozen or so files at once.

ONLY use this for stubborn '.com's'

Action type: Format value
Field: TITLE
Formatstring: $regexp($regexp($regexp(%title%,'^[^\s]+.([a-z]{1,4})(\s*)[^\sA-Z]+(\s*)', ),'^[\s\t]+|[\s\t]+$',),'\s{2,}', )

I bought a program called RegexBuddy because I was eager to learn. (see http://www.regexbuddy.com/democreate.html)

But other resources are:
http://journalxtra.com/webmasterresources/...ick-guide-3436/
http://www.regular-expressions.info/tutorialcnt.html
http://www.regexpstudio.com/TRegExpr/Help/RegExp_Syntax.html

You are going to have to do this for every tag as Format does not allow _TAG but that should not be a big task.

The only thing you need to change are highlighted PURPLE
Action type: Format value
Field: TITLE
Formatstring: $regexp($regexp($regexp(%title%,'^[^\s]+.([a-z]{1,4})(\s*)[^\sA-Z]+(\s*)', ),'^[\s\t]+|[\s\t]+$',),'\s{2,}', )

Just hit the duplicate button to copy the action within a group action and then change that values to the FIELD you want to affect.


#16

Just a thought, there has been a lot of talk about removing "advertising" web-links but what if I wish to keep the legitimate ones like the artist bandcamp link but remove any others?
Is there a way to exclude these from removal?


#17

I agree. The legitimate ones should stay.


#18

How do you find out it is a legitimate one?


#19

Personally, I would like to keep any Bandcamp links but am looking for something where I can indicate what ones are to remain...and the remainder can be deleted.