Mp3Tag-macOS writes malformed text field (wrong string length)

While exploring the ID3v2.3 TLAN (Language) field, by writing to it from Mp3Tag-macOS, I discovered that the value-string (three-letter country code) for that field was recognised by some apps (e.g. foobar2000-macOS, MediaInfo-macOS) but not by Jaikoz-macOS, which merely displayed a blank field (i.e. field name but no value).

Initially imagining that this was an issue with Jaikoz, I reached out to its support forum, at that point naively suspecting an un-read end-string null-character delimiter (later I was informed that no such delimiter would be expected, according to the standard). On request, I sent the developer (Paul Taylor) an example Mp3 file that had a fresh ID3v2.3 tag with (only) a TLAN frame written to it by Mp3Tag-macOS.

However, it seemed, from Paul's analysis, and as I was subsequently able to confirm via a smart hex editor (Synalyze It, with an MP3/ID3 grammar template) for that same example mp3 file, that the "unread value-string" experience in Jaikoz was instead rooted in the way in which Mp3Tag-macOS had written the string to the TLAN frame. Additionally, based on its grammar template, Synalyze It reported a string-length error, as:

ERROR: String size set to zero - skip element 'Text'

The value-string in question was "fra" (ISO-639-2 code for Francais i.e. French language). The string was in UTF-16 encoding, delimited not by an endstring character (e.g. null) but (implicitly) by (what I would call) a frame-body length value (in the four bytes just after the frame identifier, in this case "TLAN"). This value was two bytes too big (11, should have been 9). I wonder if that was because Mp3Tag had included the two BOM bytes in the length-count.

I understand that this thread in this community may be relevant.

Unfortunately the ID3.org server appears down at the moment (has been for at least a few days). However at this site is what appears to be a specification or description, saying:

Unicode strings must begin with the Unicode BOM ($FF FE or $FE FF) to identify the byte order.
...a wording which - to my naive eye gives the impression that the BOM is part of the string, whereas I wonder if the author's intention was for the BOM (in this context) to act as a kind of string header, preceding its body. In which case the truly-real root of the problem might arguably be in the spec's wording. But that is just my spec. speculation - and my superficial-subject-appreciation neurons are beginning to get tangled...

Regardless, as a result, a Jaikoz update was released yesterday to work around this (excessive string length value) situation, as a result of which Jaikoz now displays the Mp3Tag-written TLAN value as expected. From what I understand (from the developer's explanation), the workaround, that I assume is in Jaikoz's string-value-reader function, was essentially to treat null in this context as an informal endstring delimiter. Even though in reality any such nulls encountered would (I imagine) only constitute padding nulls just after the end of the frame itself. The idea being that if, while attempting to "obey" the given string length, the string-reader function detected a null (implying that it had "run past the end" of the real string) it would stop and return the string up to just before that null.

I wonder if the various other apps I mentioned (other than Jaikoz and Synalyze It), which appeared not to have any issues reading the same value, employ some similar "defensive" tactic (not necessarily with Mp3Tag in particular in mind), against a de facto "meandering away" by some apps from the intention of the official format spec. I appreciate that conceiving/implementing defensiveness against de facto meanderings is a standard part of the pragmatics of programming real-life end-user apps.

Assuming the stated diagnosis to be correct, should a fix be made to the "incorrect string length" issue in Mp3Tag ? Or is there some other relevant factor?

I think I can't add anything new to what I've written here, explaining the reasoning behind how and why Mp3tag writes a terminating null character for text fields.

Many thanks to Paul, for releasing a version of Jaikoz that reads both ways of string representation.

Wandering away from Mp3Tag specifically, to the more general issue (that could affect Mp3Tag)...

I can imagine a potential pitfall, where null is used as (unofficial) delimiter in a (unofficially written or assumed on reading) null-terminated value-string in a frame in an ID3v2.3 tag. The clash of the unofficials!

To distinguish terminal nulls from multi-value delimiting nulls, I imagine it would be necessary for an algorithm to look more than one character ahead, e.g. for "null followed by not-null" or maybe "null followed by printable character". Does that seem reasonable?

This is of interest to me, not just academically, but also because I use a smart hex editor (Synalyze It) to help understand what exactly gets stored (by any app) in the tags in mp3 files. The MP3+ID3v2 grammar template available for Synalyze appears to "trip up" on "null terminates string" (not your fault!), thereby regarding only the first of null-delimited multiple values as a value, the remainder being interpreted as "padding bytes" (regardless of their byte-values). A consequence of the use of null-delimited multi-value strings not being officially supported in ID3v2.3 and also (I presume) the Synalyze grammar-template's respect for the de facto "null terminator" practice. The simple fix to all of this confusion would be (I guess) to ensure all ID2v3.3 writers and readers (apps) made correct use of the 4-byte length value (following the frame identifier), so there would be no need to entertain null terminator as a "string-reading safety net".

Or have I got something wrong here?

It's complicated, and I'm not sure if I've taken it all in correctly.

FYI: As an alternative to ID.org's unavailability at the moment,
Here's the latest proper copy on the Wayback Machine

Also this clarification of one type of frame (Chapters), with nice diagram of frame header etc., again on Wayback, but some bits apply to frames in general.

And this discussion thread on StackOverflow about the interpretation of ID3.org's "Informal Standard" (in which it seems some parts e.g. for ID3v2.3 have been clarified for ID3v2.4, and apply equally to ID3v2.3).

There is another alternative to ID3.org's ID3v2.3 specs, though of 1999 vintage, on SourceForge: ID3v2 Developers Information

Yes, what makes this a bit easier, is that the ID3v2 frame header already contains the full size of the frame.

Regarding your other follow-up posts and links: is there anything that is unanswered for you or are you adding those just for reference?

Yes thanks, just for reference - for myself and also anybody else "missing" good old ID3.org

...while I slowly continue (when time permits) to dredge through deeper details of ID3v2.3 than I have done before, looking things up, analysing example files and writing-up for my own learning/reference, to make sure I understand the precise situation here.

1 Like