Copy whole uppercase words

confucius · June 3, 2023, 9:09pm

Hello, I'm stuck. I'm wanting to copy all the words in upper case & any spaces between them.

e.g. "4 MY FILENAME McWILLIAM Is Just This" or "5 MY FILENAME McWILLIAM A Is Wrong" This can be in any field as I'll be deleting it when finished with, see bottom.

I want to copy "MY FILENAME McWILLIAM" to another field. Or just strip out all the other letters & no's leaving "MY FILENAME McWILLIAM".

Full story, I have a text file with multiple lines as my example above. Ideally I would like to just import the upper case words including "McWILLIAM" from the text file to the artist field.

I currently use this, which works most of the time, I import the text file to the comment field then...

Format value "COMMENT":$regexp(%comment%,(.?)\s(\u\l.),$1 ~ $2)
Guess values "Comment":%track% %artist% ~ %title%
Format value "ARTIST":$caps(%artist%)
Replace "Title" "~" ""
Remove fields "Comment"
Format value "TRACK":$num(%track%,2)

LyricsLover · June 3, 2023, 9:50pm

How do you recognize that "MY FILENAME McWILLIAM" is the part in whole uppercase (if the "c" in McWILLIAMS doesn't follow your rule)?

To strip out the part "MY FILENAME McWILLIAM" you need a unique separator.
How would you define such a unique separator for the part before "MY FILENAME McWILLIAM" and the part after "MY FILENAME McWILLIAM"?

Maybe we can help better if you show us some complete lines from your source text file.

confucius · June 3, 2023, 10:12pm

Sorry but I'm not sure what to say really other than, that is a complete line, as is the next example, above. Where it falls over is when it copies McWilliam part. It copies it but it also copies "~" giving me "~ MY FILENAME MCWILLIAM". Easy enough to correct afterwards though.

ohrenkino · June 4, 2023, 5:10am

Which function do you use?
Which copy instruction do you use?
What is the source?
What should be the target field?

confucius · June 4, 2023, 3:02pm

Which function do you use? Text file to tag plus those listed above.

Which copy instruction do you use? Format value "COMMENT":$regexp(%comment%,(.?)\s(\u\l.),$1 ~ $2)

What is the source? 42 years worth of field recordings made by my father, in various formats (mp3, wave, flac, m4a and ape). I'm about 8,000 into the first batch of 250,000 recordings. The filing system dad used has been a nightmare to get to grips with, it's not consistent in anyway that I could fathom out. Some of the tags are in the wrong place, I can move these easily. I'm removing the art work as some of the pictures are very large, and they also give me some more details I can use. Some of the files do not reduce in size when the art work is removed. I can remove all the tags then undo this and the size is reset. I use the flac program to deal with the flac files though. Like I said above it mostly works. It fails on those mac and mc names where there are mixed case and any single upper case letter (usually a "A") that follows the part I want. I can deal with these manually.

What should be the target field? Depending on what is contained in the file will determinate where i copy/move too. I currently adjust the above to do this.

I've sussed it out now (only took a few months for the light bulb to light up!). All I need to do is change any mixed case mac or mc names to upper case and all the above works, apart from those odd "A"s, which I can deal with manually.

ohrenkino · June 4, 2023, 3:22pm

A suggestion:

$regexp($replace('4 MY FILENAME McWILLIAM Is Just This',MY FILENAME,),'(.)\s+(\u.*\u\u)\s* (.*)',$1--$2--$3)
Leads to:
4--McWILLIAM--Is Just This
This should be a good string to be used in "Guess value"
if the string that you now described as "MY FILENAME" is really the filename, then you can cut it from the string so that it does not collide with the rest.
And if the whole string can be found in COMMENT, then the expression would become:
$regexp($replace(%comment%,%_filename%,),'(.)\s+(\u.*\u\u)\s* (.*)',$1--$2--$3)

The other idea is then: get the first character and any amount of spaces, and then a string part that starts with an uppercase character and ends in 2 uppercase characters - and then the rest.
Does this match your data?

confucius · June 5, 2023, 11:34am

I will give them a look when I'm back home. Oh and yes these are the file names. For example I have 21 that with the four, but they do not have an extension, so I'm thinking, what ever he used to write them to cd trunk-catted them.

Then there's the hand writing.....

confucius · June 6, 2023, 12:19am

The first suggestion works on a file with that name. The second doesn't give any results at all.

So I'm giving up...

Time to change our (Great niece and I) approach to this. It's been 8 months on & off, so far. I'll have a chat with my great niece who is the owner of all this stuff. May be I'll go back to the way I started before being convinced this was a better way. Which was not to bother with these files names. They are documented anyway. And go for something simple like cd1 - track 1, cd2 - track 1 etc. So long as I keep a note of which cd - dvd is in which folder. Then load the lot up in mp3tag and use filters to show them in batches and rename from there. I don't have the cd -dvd's here they are down NZ, I'm using a back-up of a back-up, just in case a muck something up or lose my sanity.

Anyway thanks for the in put.

ohrenkino · June 6, 2023, 5:22am

If that does not give any results then the string that you showed us does not match the real data.
The regular expression and the replace function are case sensitive. So if only 1 letter in the filename has a different case than the string in the text file then it does not work.
So far we have not seen any real data but had to rely on your word -and as you see if I take your sample string
4 MY FILENAME McWILLIAM Is Just This
then

You can test an expression in Convert>Tag-Tag

confucius · June 6, 2023, 3:01pm

doc.pdf (699.0 KB)
So the goal is to retrieve the upper case words and save them to tag.
The first page has what I get currently.
The second page is the new one, but with out the final formatting procedures.
The third page is the text file.

ohrenkino · June 6, 2023, 3:14pm

I see - so "MY FILENAME" is actually the string in the expression.
I thought this was a variable for an uppercase string of the real filename.
So,
$regexp($replace(%comment%,%_filename%,),'(.)\s+(\u.*\u\u)\s* (.*)',$1--$2--$3)
should be:
$regexp($replace(%comment%,MY FILENAME ,),'(.)\s+(.*\u\u)\s* (.*)',$1--$2--$3)
This will also cater for
6 18 FIELD MICE GOLD kike's recording
You would have to find out if there are any other constant strings like "MY FILENAME" that should be ignored - which means: treated by the $replace() function.

confucius · June 6, 2023, 4:55pm

We believe that "MY FILENAME" refers to something in one of his documents and there are hundreds of them. But matching these two things isn't going well. Just wish the "mad professor" as the grandchildren called him, new the benefits of a filing system and not just piling stuff on yards shelving. He was still adding to it in his 90's! Still had a sharp mind but the hand writing was unreadable by most humans.

Anyway what we'll do now is finish getting the stuff off of CD DVD about 100 to go, then load it all up mp3tag, export every file with every tag to a text file for later use in excel. Then cut this down into sections of work instead of trying to do it all at once. I much prefer "do this", "verify"," do next", "verify", etc. So no finished product until the last verify is completed, slow but works for me.

Thanks for your input, it has made me rethink this headache.

ohrenkino · June 6, 2023, 5:03pm

It would have been quite nice if you had tested my idea.

What benefit would you get from Excel? I hardly see any as long as you have a program that reads the tag data. And that program will allow a much directer access to the file itself than any Excel sheet.

confucius · June 6, 2023, 6:32pm

For cataloguing. And the grand niece's uni what it, if they take the recordings. They are sitting on the fence in this regard. We also need to keep a track of what data we put on these files. We need something we can use both sides of the globe. We can create our own text files (better than we have already) from the spreadsheet, with the wanted tags and import them.

Searching through all his documents and extracting the useful stuff as we work through them. We need to have one master document, where we can collate it all.

So a radical rethink of how we do this, in the next few weeks. We'll sort something out between us so we don't duplicate or lose work. So, sort out the data first then rename and tag the files.

confucius · June 6, 2023, 6:42pm

Ooops forgot to say the latest version works on all the files, but can you keep the "MY FILENAME" part too.

Thanks

confucius · June 6, 2023, 7:14pm

Do I just remove "MY FILENAME ," or "MY FILENAME " or "MY FILENAME"?

ohrenkino · June 6, 2023, 7:16pm

You remove the whole $replace() and use only %comment% as source string in the regular expression
$regexp(%comment%,'(.)\s+(.*\u\u)\s*(.*)',$1--$2--$3)

confucius · June 6, 2023, 7:29pm

Thank you and good night.