[X] LF and CR control codes in Vorbis Comment values

I don't know if this is a bug or not, but I would appreciate comments on this possible problem. For reasons that I cannot explain, another person on our project managed to insert LF and CR control codes in Vorbis comments (FLAC files) in the following format:

PERFORMER=[LF][CR]Value

In Windows XP systems, when viewing tags using the ALT-T interface, the presence of these two characters is indicated by two "empty box" symbols after the "equal" character and before the proper value associated with the field name. If you open the tag to edit it, you will see that the text comprising the value is on the second line in the edit field.

This anomaly is not easily noticed in Windows 7 systems. The tag shows up like this:

PERFORMER=Value

even though the two control codes precede the text comprising the value. We verified that they ae there using HxD, a hex editor application, to look at the file contents.

We know that the value is supposed to comprise only "text" characters from a UTF-8 compliant character set, not control codes. We are also under the impression that the characters comprising the value are supposed to immediately follow the equal sign. Perhaps this is not a requirement.

We are concerned that some applications designed to read metadata in FLAC files may not read them properly if there are unwanted control codes between the equal sign and the value proper.

The presence of the "empty box" characters as seen in XP-based systems is desirable in our opinion because it indicates the presence of these unwanted intruders. In Windows 7 systems, they are not readily detected. Whether or not they are a problem is not clear to us.

Thanks in advance for your comments and recommendations. Is it OK to ignore these?

"d2b"

Different computer systems use different line break control characters:
Unix/Linux = LF, Apple = CR, Windows = CR LF
LF = Line Feed = A = d10 = \n
CR = Carriage Return = D = d13 = \r

You have detected an unusually sequence of line break control characters: LF CR.
Mp3tag on Win XP will show two graphical boxes.
As you said, Mp3tag on Win 7 does suppress the bad character sequence.

You can test the behaviour in the Convert "Tag - Tag" preview:

$regexp($char(10)$char(13)'some text','^[\r\n]+',) ==> 'some text'

If you want to remove only this one leading bad sequence LF CR you can do it like this example ...

$regexp($char(10)$char(13)'some text','^\n\r',) ==> 'some text'

You can remove any leading line break control character using a Mp3tag action.
Action "Format value"
Field: PERFORMER
Formatstring: $regexp(%PERFORMER%,'^[\r\n]+',)

DD.20141013.0936.CEST

Thank you for your detailed response. For now, I have fixed the problem manually but I will add your action item to our "post-rip" clean-up item.

Dennis, aka "d2b"

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.