Timed Text Markup Language

https://help.apple.com/itc/videoaudioassetguide/#/itc0f14fecdd

Apple Music is using ttml for synced lyrics.

Is there a m4a and id3tag equivalent tag for ttml?

For ID3v2 it's the SYLT frame which is currently not supported by Mp3tag. The main reason for that is, that it would require a dedicated ID3v2-only editor for adding timestamps and accompanying lyrics fragments.

I don't know if there is a M4A/MP4 equivalent at the moment. It seems that many simply use *.lrc files to store the lyrics.


From id3v2.4.0-frames.txt: 4.9. Synchronised lyrics/text

This is another way of incorporating the words, said or sung lyrics,
in the audio file as text, this time, however, in sync with the
audio. It might also be used to describing events e.g. occurring on a
stage or on the screen in sync with the audio. The header includes a
content descriptor, represented with as terminated text string. If no
descriptor is entered, 'Content descriptor' is $00 (00) only.

 <Header for 'Synchronised lyrics/text', ID: "SYLT">
 Text encoding        $xx
 Language             $xx xx xx
 Time stamp format    $xx
 Content type         $xx
 Content descriptor   <text string according to encoding> $00 (00)

Content type: $00 is other
$01 is lyrics
$02 is text transcription
$03 is movement/part name (e.g. "Adagio")
$04 is events (e.g. "Don Quijote enters the stage")
$05 is chord (e.g. "Bb F Fsus")
$06 is trivia/'pop up' information
$07 is URLs to webpages
$08 is URLs to images

Time stamp format:

 $01  Absolute time, 32 bit sized, using MPEG [MPEG] frames as unit
 $02  Absolute time, 32 bit sized, using milliseconds as unit

Absolute time means that every stamp contains the time from the
beginning of the file.

The text that follows the frame header differs from that of the
unsynchronised lyrics/text transcription in one major way. Each
syllable (or whatever size of text is considered to be convenient by
the encoder) is a null terminated string followed by a time stamp
denoting where in the sound file it belongs. Each sync thus has the
following structure:

 Terminated text to be synced (typically a syllable)
 Sync identifier (terminator to above string)   $00 (00)
 Time stamp                                     $xx (xx ...)

The 'time stamp' is set to zero or the whole sync is omitted if
located directly at the beginning of the sound. All time stamps
should be sorted in chronological order. The sync can be considered
as a validator of the subsequent string.

Newline characters are allowed in all "SYLT" frames and MUST be used
after every entry (name, event etc.) in a frame with the content type
$03 - $04.

A few considerations regarding whitespace characters: Whitespace
separating words should mark the beginning of a new word, thus
occurring in front of the first syllable of a new word. This is also
valid for new line characters. A syllable followed by a comma should
not be broken apart with a sync (both the syllable and the comma
should be before the sync).

An example: The "USLT" passage

 "Strangers in the night" $0A "Exchanging glances"

would be "SYLT" encoded as:

 "Strang" $00 xx xx "ers" $00 xx xx " in" $00 xx xx " the" $00 xx xx
 " night" $00 xx xx 0A "Ex" $00 xx xx "chang" $00 xx xx "ing" $00 xx
 xx "glan" $00 xx xx "ces" $00 xx xx

There may be more than one "SYLT" frame in each tag, but only one
with the same language and content descriptor.

Apple is ttml in Apple Music for syncd lyrics

I have not tried to save a external file with ttml extension and see if iTunes picks up the lyrics

Or is there a m4a embedded tag, I don’t know

From what I understand, TTML is used to submit lyrics to Apple Music when adding music to their catalog. I don't see any specification on how this is embedded in, e.g., an M4A file.

If you happen to have such a file that shows synced lyrics, you could try and analyze to see if and where the lyrics are embedded.

1 Like

I will explore and let you know.

Flac uses .lrc dir syncd lyrics.

Similarly Apple could support something. They do not document well.