Conditional copy part of tag to another tag

ForSSound · June 17, 2013, 8:11am

Hi There,

I would like to make a action which copy a part of a tag to another tag:
The conditions are: UPPERCASE words and words between ","

examples:
%comment% = "LC 13357ATMOSPHERE ETHNIC POP, UP, adventure, RAPIDE, aventure,-----copyright bla bla"
%comment% = "LC 14578TRADITIONAL FRENCH POP, MID, bewitching, sneaky, MEDIUM, ravisant,----copyright bla bla"

One may notice that this tag has English and French inside (UP,MID versus RAPIDE,MEDIUM)
One may notice that this tag has %genre% %mood% info in it

The purpose after the retagging is that
%mood% = adventure and %genre% = ETHNIC POP
%mood% = bewitching sneaky and %genre% = FRENCH POP

Below is how the %comment% field looks in the lower left

LC 15786 TRADITIONNAL FRENCH - TRADITIONNEL FRANCAIS, UP, bouncy, galoubet, tambourin, flute RAPIDE, sautillant, galoubet, tambourin, flute, --------------------------------------------- ----------------- Parispar copyright etc

I initially thought of tackling this one with

Cut first 8 charachters then cut last 240 charachters then put first 2 words in Genre then put all words not written in capital and between first 2 words completely written in capital in to %mood%.

What is the best way to handle this in a flexible way

Kind Regards
Guy Forssman

ohrenkino · June 18, 2013, 4:00pm

I do not understand the examples.
Why does ETHNIC POP stays the same whereas FRENCH POP becomes only FRENCH.

Also: some words in the list are spearated by others are separated by .

So far I cannot see any underlying pattern, sorry.

ForSSound · June 18, 2013, 4:06pm

These are stupid error on my part. I manually wrote this over from another pc....I correct these now in the original post

Thank you for pointing this out to me
Guy Forssman

ohrenkino · June 18, 2013, 4:52pm

I take it that you are familiar with the fact that you can create an action group that works as some kind of batch job as it processes a set of actions in the given sequence.
What I would do, is some kind of brute force approach as it produces copies of the data first and then strips it away again.
First fill the MOOD field with the complete contents of the source field (that is comment, right?).
Then fill the GENRE field with the complete contents of the source field.
Then strip the leading number with a "Replace with regular expression": lc \d+\w+\b ((?-i)[A-Z].*)
Replace with $1.
This should leave
ETHNIC POP,UP, adventure,RAPIDE, aventure,-----copyright
then format the field with a $left(%genre%,$sub($strchr(%genre%,','),1))
this should leave ETHNIC POP.

to find the lowercase words use .,((?-i)[a-z]),
and replace it with $1

ForSSound · June 19, 2013, 9:24am

ohrenkino:

I take it that you are familiar with the fact that you can create an action group that works as some kind of batch job as it processes a set of actions in the given sequence.
What I would do, is some kind of brute force approach as it produces copies of the data first and then strips it away again.
First fill the MOOD field with the complete contents of the source field (that is comment, right?).
Then fill the GENRE field with the complete contents of the source field.
Then strip the leading number with a "Replace with regular expression": lc \d+\w+\b ((?-i)[A-Z].*)
Replace with $1.
This should leave
ETHNIC POP,UP, adventure,RAPIDE, aventure,-----copyright
then format the field with a $left(%genre%,$sub($strchr(%genre%,','),1))
this should leave ETHNIC POP.

to find the lowercase words use .,((?-i)[a-z]),
and replace it with $1

Thanks for this input..

At first I couldn't get it that your test didn't work but it seems that %comment% which is indeed the source field has everything already grouped with a return.

So I did a LC \d+\w+\b\r\n((?-i)[A-Z].*) result= ETHNIC POP,UP, adventure,RAPIDE, aventure,-----copyright

Can You elaborate what the purpose of the (?-i) is?

Anyway thanks a lot already.
Your example surely helped me to better understand regexp

Kind Regards
Guy Forssman

ohrenkino · June 19, 2013, 10:28am

Ah - probably this is only for filters: it makes the statement case-sensitive ... and the filter is not.
So perhaps the [A-Z] should already limit the selection to capitals. But apparently the (?-i) does not do any more harm than to make the statement a little more complicated to read.

ForSSound · June 19, 2013, 12:25pm

Thanks for this information I'm very new to this regexp and searched for some online testers but hum which one agrees with mp3tag?
Furthermore when I use ([a-z]) it only seems to look in the first line.
I tried with the \n\n([a-z]) the only thing that changes is that the return symbols are removed.
Also there is 1 word with a capital written ....Can one search for complete words?
So I hope you can point me one more time in the right direction.

LC 13373
MILITARY, MILITAIRE,
MID, war, battle, strict, French, military drums,
MEDIUM, guerre, baitaille, strict, Francais, tambour,
---------------------------------------------------------------------------
All rights reserved
Producer publisher
PARSIPARLA
3 rue des Aulnes
03000 AVERMES
FRANCE
tel fax 33 (0)4 70 20 65 10
info@parsiparla.com
www.parsiparla.com

ohrenkino · June 19, 2013, 3:36pm

QUOTE (ForSSOund @ Jun 19 2013, 16:25) <{POST_SNAPBACK}>

... ([a-z]*) it only seems to look in the first line.

I tried with the \n\n([a-z]*) the only thing that changes is that the return symbols are removed.
Also there is 1 word with a capital written ....Can one search for complete words?
So I hope you can point me one more time in the right direction.

LC 13373
MILITARY, MILITAIRE,
MID, war, battle, strict, French, military drums,
MEDIUM, guerre, baitaille, strict, Francais, tambour,
---------------------------------------------------------------------------
All rights reserved
Producer publisher
PARSIPARLA
3 rue des Aulnes
03000 AVERMES
FRANCE
tel fax 33 (0)4 70 20 65 10
info@parsiparla.com
www.parsiparla.com<!--QuoteEnd--></div><!--QuoteEEnd-->

^.[A-Z] ([a-z]+),.*
will reduce the whole string to tambour
as this is the "last" word that is in lower case followed by a comma.
(The rest of the translation:
Take any letters (.) from the beginning (^) followed by a sequence of upper case letters ([A-Z]) and a blank and then pick out a set of lower case letters followed by a comma for the replace action and then the rest.
Which word were you trying to get?

ForSSound · June 19, 2013, 6:07pm

The end result should be ideally "war, battle, strict, French, military drums"
The second line is the same but in French

Does regexp has the possibility to look only in the first or n line of this example for lowercase?
I thought that \n\n([a-z]*) would search for lowercase letters on the 3 th line..Obiviously I would have missed the F from French.

Thanks for helping me. When Finished I guess I can add this to the regexp examples as this is a more elaborate example and great for learning by example.

Guy Forssman

ohrenkino · June 19, 2013, 6:20pm

This regexp
^.\n.\n[A-Z], (.)\n.\n---.
replaced with
$1
will produce

ForSSound · June 26, 2013, 12:31pm

Hi Back from a 4 day trip to Sweden...

In fact thanks a lot for this, it works, but sometimes the %comment% field isn't properly formatted and thus the regexp doesn't work

^.\n

start beginning line search every character till end of line

.\n

search every character till end of line

[A-Z], (.)\n

All Capitals from Alphabet a , followed by everything else and put in group $1

I thought I change [A-Z], (.)\n to ([a-z]*)\n just take every non capital helas that doesn't work either..

Your code works on the first example but not on the second

LC 13373
CLASSIC, CLASSIQUE,
SLOW, melancholy, lanscape, violin
LENT, melancolie, paysage, violon, 
---------------------------------------------------------------------------
All rights reserved
Producer publisher
PARSIPARLA
3 rue des Aulnes
03000 AVERMES
FRANCE
tel fax 33 (0)4 70 20 65 10
info@parsiparla.com
www.parsiparla.com 

LC 13373
UP, race, clarinet
RAPIDE, course, clarinette,
---------------------------------------------------------------------------
All rights reserved
Producer publisher
PARSIPARLA
3 rue des Aulnes
03000 AVERMES
FRANCE
tel fax 33 (0)4 70 20 65 10
info@parsiparla.com
www.parsiparla.com

dano · June 27, 2013, 9:51am

This should work for both examples:
^.\n[A-Z]+, (.+)\n.\n---.*

ForSSound · June 27, 2013, 12:01pm

Indeed this produced a lot more hits unfortunately it didn't solved every entry

Can somebody explain to me how to choose only the NON capital words?

I guess it's the part between ^.\n and \n.\n---.* that should be changed.

What is wrong in my logic with ([a-z]*)\n?

I could give another example with didn't work but I think it's better to search for a more flexible approach.

Anyway thanks a lot for the support
Guy Forssman

dano · June 27, 2013, 12:30pm

If you just want all lower case words from this kind of text

LC 13373
UP, race, clarinet
RAPIDE, course, clarinette,
---------------------------------------------------------------------------
All rights reserved
Producer publisher
PARSIPARLA
3 rue des Aulnes
03000 AVERMES
FRANCE
tel fax 33 (0)4 70 20 65 10
info@parsiparla.com
www.parsiparla.com

it would give more results, also
rights reserved
publisher
...

So please give a more exact description. Do you mean all lc words until the ----- line?

ForSSound · June 28, 2013, 7:07am

QUOTE (dano @ Jun 27 2013, 16:30) <{POST_SNAPBACK}>

If you just want all lower case words from this kind of text

LC 13373
UP, race, clarinet
RAPIDE, course, clarinette,
---------------------------------------------------------------------------
All rights reserved
Producer publisher
PARSIPARLA
3 rue des Aulnes
03000 AVERMES
FRANCE
tel fax 33 (0)4 70 20 65 10
info@parsiparla.com
www.parsiparla.com

it would give more results, also
rights reserved
publisher
...

So please give a more exact description. Do you mean all lc words until the ----- line?

Thanks for the reply I only want the lowercase words on the second line... so in this example it would be race, clarinet

dano · June 28, 2013, 11:53am

Another try, this will find the first line with lower case words:
^[^a-z]+[ ,]([a-z][a-z, ]+)[\r\n]+.+?-------.+

ForSSound · August 27, 2013, 9:01am

Hi There,

it seems this regexp doesn't work for me..

I have used ^.\n[A-Z]+, (.+)\n.\n---.*
witch gave me following results for the 4 different cases below
CASE1 =
CASE2 =adventure
CASE3 =bouncy, race, galoubet, tambourin, flute
CASE4 =LC 13373 POP ROCK, CORPORATE, UP,adventure, walk, RAPIDE, aventure, marche, --------------------------------------------------------------------------- All rights reserved Producer publisher

Unfortunately there are a lot of wrongly formatted original tags like CASE4 to many to do this manually..
Sometimes the mood is on the second line sometimes on the third It's however always non capital.

1 LC 13373

REGGAE
MID, 
MEDIUM,
---------------------------------------------------------------------------
All rights reserved
Producer publisher
PARSIPARLA
3 rue des Aulnes
03000 AVERMES
FRANCE
tel fax 33 (0)4 70 20 65 10
info@parsiparla.com
www.parsiparla.com

2 LC 13373

TECHNO ,
UP, adventure,
RAPIDE, aventure,
---------------------------------------------------------------------------
All rights reserved
Producer publisher
PARSIPARLA
3 rue des Aulnes
03000 AVERMES
FRANCE
tel fax 33 (0)4 70 20 65 10
info@parsiparla.com
www.parsiparla.com

3 LC 13373

TRADITIONNAL FRENCH - TRADITIONNEL FRANCAIS,
UP, bouncy, race,  galoubet, tambourin, flute
RAPIDE, sautillant, course, galoubet, tambourin, flute,
---------------------------------------------------------------------------
All rights reserved
Producer publisher
PARSIPARLA
3 rue des Aulnes
03000 AVERMES
FRANCE
tel fax 33 (0)4 70 20 65 10
info@parsiparla.com
www.parsiparla.com

4 LC 13373

POP ROCK, CORPORATE,
UP,adventure, walk,
RAPIDE, aventure, marche,
---------------------------------------------------------------------------
All rights reserved
Producer publisher
PARSIPARLA
3 rue des Aulnes
03000 AVERMES
FRANCE
tel fax 33 (0)4 70 20 65 10
info@parsiparla.com
www.parsiparla.com

^a-z ↩︎

ohrenkino · August 29, 2013, 7:27am

The expression
^.\n[A-Z]+.\n([a-z][a-z]+.)\n.\n---.*"

leads to
UP, adventure, walk,

in case 4.
Perhaps you can go on from there.

One footnote: it is very hard to get a testbed for these rather special cases - so the effort you would have to invest to do it manually is required by those who try to copy your environment. This may be a reason why you do not get that many answers.

So, you can test your regular expression with the filter and the MATCHES keyword.
Enter the expression without any round brackets and see if you get any hits.
Then try the expression with brackets in a single action on a single file.

ForSSound · September 6, 2013, 5:39am

Hi There,

I would first thank everybody who has even just read this ..
I must be doing something wrong because the latest 2 attempts didn't work for me.
Here are 5 small mp3 test files with the different cases.
How I applied the rule you can find in the screenshot

Thanks in advance
Guy Forssman

dano · September 6, 2013, 8:12am

I've fixed my old regex:
^[^a-z]+[ ,]([a-z][a-z, ]+)[\r\n]+.+?-------.+

Also check [x] case-sensitive comparison

or try
^.+?\n[A-Z]+[, ]+([a-z][a-z, ]+).+?----------.+