What is Best Practice to Fix A Lot Of Files

Hello everyone, I must admit that I am starting to despair of being able to clean this MP3 file base, maybe I have to give up and throw them all in the trash,

I am making a comparison between two hard drives

Let's say HDA represents the original hard drive
Let's say that HDB represents the hard drive on which the processing takes place

We can see that there are about 400 more songs on HDA, which is explained by a first attempt to remove duplicates on the other disc.

Found that some tags are wrong on HDA

Found that on HDB, after automated processing, some tags that were correct have become false

Found that on HDB, after automated processing, some tags which were false have become correct

Is there finally no other way except the human ear to confirm the fair value of metadonates

Are there any tips the more experienced can share?

I would tend to think that the pre-2014 HDA source has more bad tags after automated processing on a 2021 operation on HDB, but I have no certainty, I cannot say what the error percentage is on HDA and HDB

I have 50,000 files being processed and it is really VERY long to set up, which is why I wanted to rely on an automated process, but it doesn't seem like the right solution ...

I am a little lost, I had fixed the summer to achieve the operation, and I feel that I do not manage any more, I do not imagine myself checking 1 to 1 more than 30 000 files which have undergoes modifications during automated processing.

It is also in the sense that we can appreciate MP3tag which offers semi-automated means, in order to always keep a form of control.

So I suggest to the experts, starting a sort of Best Practices of doing things so that those who want to get started can have a good starting point with the right tools and the right processes.

For example is there a way to compare two MP3 Music Files Collection (base, storage, playlist),... from fingerprint, or in other way ...

  1. Define one (1!) Source for your files
  2. Create at least one (better two or more) backups of your complete Source
  3. Modify your files at your source only
  4. Check your changes with a set of possible use cases (like special characters in filenames, special artists names like AC/DC or A★Teens, Case spelling for different languages and so on)
  5. Repeat step 2 after every checked modification step

For your current problem with the different files with different tags and amount of files in different locations I can't tell you an automated way to solve it.

Yes, you can compare two Mp3 Music File collections with fingerprints. But his will not help you to decide which song contains the correct metadata. It only support you to recognize if a song is "Waterloo from ABBA" or "Hells Bells from AC/DC".

First I will Try to export to EXCEL and doing a VSEARCH from Fingerprint and make a gap analysis, I would like to find a tool to automate maybe I will make a test with a too I found exclusively with Discog, For me MusicBrainz is VERY ABSOLUTLY DANGEROUS for all of our MP3 collection, producing a messy collection, I did it with a songkong, it's an amazing product, but works with Musicbrainz that is not reliable at all

I don’t quite understand this part. What parts would have changed from correct to false after you made corrections? Unless the processing was done incorrectly this shouldn’t happen arbitrarily.

HI In fact that depends on the Musicbrainz consistency, now I understood Musicbrainz process, this is the fact some AcoustID_FingerPrint are not correctly assigned in the MusicBrainz Database, so after a 50 000 Files Batch processed, I have some good result and some bad, because un HDA it's already like this, some Files are goodly tagged and some badly, it's a fact we can't trust Musicbrainz it can be good for ten songs and yes, we can humanly juge if t's correct or not, processing 50 000 files with this database it's clearly not a good idea, now I try to find a way to repair all of this, but yes it's 50 000 files ! :crazy_face:

Unfortunately in this case with 50k files, it is going to take some time to make updates. I would try to break these into smaller batches to try to simplify the process. You can try doing all compilations as one batch (or Various Artists as the AlbumArtist tag), then Artists or Albums starting with A-E, F-J, etc., then last go to any albums that are the most unusual and work on these individually. But as mentioned by @LyricsLover I strongly recommend making sure you have a complete backup.

I’m sure you didn’t collect these tracks in one sitting, so don’t expect to correct the tags 100% to your liking in one go either. I have 25k+ tracks and have been working on improving the metadata tags for well over a decade after my initial start using an early version of iTunes and lower quality mp3 (192!?!) bit rates at the time. I have since ripped all again but at lossless, and copied over all of the tags I had previously corrected using mp3tag. But even now I notice the odd typo or incorrect tag that will pop up when playing music in random mode. A quick edit to fix that is all it takes now.

1 Like

If i understood he core of your problem correctly then my answer would be:

And that is why I do not make automated tagging and luckily never have been using external databases as literal source of data for tagfields - and what is more I always wipe out all of tag fields of every file added to the collection, just to be sure that no approved info will slip by. Such modus operandi of course takes more time to run my database - but then again any issued that arise are only on behalf of me making typos [which when spotted I can easily correct]

[My answer is not a remedy to your current problem - but a suggestion for future]

Hi Motley, compilations are the worst case, because most of the time Automatic processing change the song of a compilation as a single or original song album. Honestly I tried to do it like you are saying with an Excel tracker, beginning by the most important artist collection, the situation began complicated especially with various artists compilation

Rome wasn’t built in a day. Your 50k files will take some time to go through and clean up. I’m sure the end result will be worth the effort though.


  1. I would suggest to check that each file has a fingerprint and / or an ISRC ID, it is very important that the files can have an ID based on the signal frequency of the song

For... what purpose exactly in my modus operandi?

What would I gain by adding this info to them? And what happens if I cut of second half of some song because I do not like it - because e.g. it has too many high frequencies for my mellow oversensitive ears - thus leave only its first 2 minutes out of 4?