Questions regarding FLAC metadata (UTF-8 compliant)

d2b · November 19, 2016, 11:30pm

When managing a large library of music encoded as FLAC files, we frequently need to add fields or "fix" tag values. When you have 150,000 files derived from, say, 12,000 CD albums, maintenance becomes a significant issue. I've mentioned before that we need a good way to compare metadata between two files, one of which has been supposedly updated with new fields and/or field values but are otherwise supposed to be identical.

While we currently use Beyond Compare (Scooter Software) to manage file updates and additions, it is not well suited to making comparisons of the metadata differences between two otherwise identical files but with differing 'date modified' dates or with different metadata content. There are two problems that occur using a plug-in furnished by Scooter Software: 1) The fields in two versions of the same file are in different order in the Vorbis container, making it extremely difficult to compare the field values, and 2) the plug-in is not fully compliant when interpreting UTF-8 encoded text. For example, the letter "e" with a diacritical mark (U+00E9) is displayed as a comma (,).

This long-winded introduction's sole purpose is to explain the background as to why I'm posting this question. In an attempt to understand what's going on, I've chosen to open files with a hex or text editor so that I can actually "see" what's encoded in the Vorbis container. I tried using HxD, a free hex editor, but it doesn't support UTF-8. A highly recommended substitute for Windows Notepad is named Notepad++. Opening the file with Notepad++ and even with HxD in ASCII mode leads to results which are the basis of my question. Please bear with me; I'm trying to understand and hopefully believe in what I see when I peer into the Vorbis container to see what's actually there. So, here's the simple question, one whose answer may be simple. I just don't know.

Mp3tag ALWAYS reads the metadata correctly, just the way it was entered. HOWEVER, when I look at the tags in Notepad++ and (usually) with HxD, I often see one or more ASCII characters (therefore UTF-8 compliant as I'm led to believe) following the actual field value. For example, in the hex/text views, I might see this: ARTIST=Barbara Streisand# as human-readable text, but the correct value of the Artist field is of course just Barbara Streisand and NOT Barbara Streisand#. Mp3tag does the right thing and does not display the "#" symbol. Hence the question below:

Is it possible to explain to me how Mp3tag and other programs like JRiver Media Center read the tag correctly by ignoring the last character? (The other characters between field entries are control characters, but the # sign is not.)

Trust me, I'm not sleeping well at night trying to figure out which application(s) we can ultimately rely on to efficiently compare two versions of the same file with different file modification dates and presumably differing only in the order and content of the Vorbis container entries. I hope one of you can help set me at ease. In the meantime, I'm going to go back to Scooter Software and beg for a better comparison plug-in that sorts the metadata by field name and is fully UTF-8 compliant.

Thanks in advance for your patience and (hopefully) any help you can provide.

Dennis.... aka "d2b"