Trouble with data with 'invalid' characters

MP3TAG has saved the day, but there's more. My collection has reached 70,000 tracks, a mixture of m4a ( :angry: I converted all 20k+ of these to mp3), mp3, wma files, over 500gb total. A retirement project is to organize and document my library into a full function database using Microsoft SQL Server. (They have a pretty nice free version with limitations which are probably not a problem). Since I have collected data from a number of sources, radio station digital library, foreign CD products, and several shared personal colletions, ( keep the originals in seperate directories!) I found the 'average' condition of tag data is pathetic, probably because various ripping software introduces lots of problems and differences, invalid data, data in incorrect tag fields and such. The largest problem has been data with characters MSFT determines to be invalid or control characters. MP3TAG handles the data well for repairs, but import functions to my SQL Server database fail or completey distort the data.

Here's my solution:

  1. export MP3TAG info via a custom script to a bar-delimited text file containing all tag fields.

  2. Then import the bar-delimited file into Excel first, being sure to specify the bar delimiter. Also, override the Excel default 'general' field format to 'text' for all fields. Save this as a .xlsx file, and import this into the SQL Server database specifying Excel as the flat file source with all fields defined NVARCHAR (255). The resulting table ends up with 255-character fields, and does not fail on invalid data or delimiters.

  3. If you are somewhat proficient in SQL, you may now query any and all data for such things as duplicates, misspelled data, missing info, by manipulating the results with such features as 'SELECT DISTINCT', 'ORDER BY', 'COUNT', 'GROUP BY' etc. And you may easily build and populate lookup tables for structuring your custom database as you like. You can create and save 'stored procedures' for standardizing and validating your data so it is a repeatable process. This is good for validating newly imported data against YOUR standards for such things as Genre, Artists, Composers, and the like.

  4. Be sure to NOT allow any software to automatically alter your scrubbed data by downloading 'corrections' from internet sites as this can undo all your hard work.

  5. I have not yet had the need to re-export the data, since it is quick and easy to fix the issues in MP3TAG and repeat the import process. I do apply my own corrections to the segregated individual source directories so they are not lost. MP3TAG makes this very easy.