Using AI to automate Audible tag retrieval

Hello, I'm new to AI but have been a software developer for about 10 years.

I'm an older man, and I don't fully understand the capabilities of AI in its current state. My hard drive contains a massive collection of audiobooks, sorted in folders as "Author Name/Book Title.mp3", where a decent amount of them are tagged in some fashion. I want to enrich the metadata and transfer them to a server interfaced with browser-based UI (Plex, Jellyfin, etc.) for easier navigation and displaying of synposes. Getting to that point is resource-intensive via manual labor.

In terms of how I do it manually: I've found a Plex Mp3tag workflow outlined on Github that involves running a search on Audible for my selected file, then renaming and moving the file to a specified location.

The first part of the user's method was to initialize a search type, then use a shortcut hotkey CTRL + SHIFT + I when selecting files to use that as a default search method. Thanks to this forum, I'm using "Audible API -> Audible.com Search by Author + Title" as my default search method. I took that a step further and set it to a macro on my keyboard, G2.

The second part is to "Rename and Move Files", which they created a script for and then mapped to ALT + A + 1. I changed the script to substitute spaces in place of colons, then mapped it to a second macro, G3.

My whiteboard concept for this is such: I have a structured folder of n size with child folders containing distinct mp3 files. Each file is one book.

  • Perform G2 search by default for each file recursively.
  • If the search result contains more than one match, attempt to match on filename. If more than one match is found, attempt to match the duration. If more than one match is found, do nothing.
  • If no author or title is present in metadata, use Folder + Filename as search instead.
  • If any search result contains exactly one match, perform G3 step to the file.
  • Files that cannot be processed via automation are simply ignored, and I will manually match them. I'm not going to attempt to program audio analysis, not even sure if there'd be a reference library to match data against.

These are large datasets. Assume at least 10,000 files for manual processing (I'm in a preservation community). Is this level of decision-making possible, and should I specifically be looking at something like PyTorch when developing it? Is there a CLI for mp3tag capable of performing searches and renaming/moving files, or would this use RPA to visually parse the screen? Need a sanity check and thought this would be the best place. TIA

My personal opinion: No AI in the world can recognise the content of my Audiobooks with 100 percent certainty.
There are some techniques to "listen" to your audio content and do some similarity checks and if some mathematical calculation match, you get back some metadata. This metadata can be correct. Sometimes... :wink:

To this day, I would never allow an AI to recognise my tracks.

No, the CLI only knows a few options:

Parsing the screen and fill a text file would enable the possibility in Mp3tag to import such data in your tracks, for example with the Convert -> Import Tags from Text Files.

i think your best bet is making a python script, and yes ai can help.
i am making a python script to learn, so far the "Move Files" is working great. it uses the right click context manager.

Just an observation on my side:
I have tried a couple of times to get from a single lyrics line to the song, asking ChatGPT.
E.g. I asked "Which song has the line 'The band is just fantastic, that is really what I think'?"
And ChatGPT answered in my case that this comes from the song "The Sound" by The 1975.
A short google search reveals these lyrics:

Well I know when you're around 'cause I know the sound
I know the sound, of your heart
Well I know when you're around 'cause I know the sound
I know the sound, of your heart
Well I know when you're around 'cause I know the sound
I know the sound, of your heart
Well I know when you're around 'cause I know the sound
I know the sound, of your heart
I can't believe I forgot your name
Oh baby won't you come again?
She said "I've got a problem with your shoes and your tunes
But I might move in and I thought that you were straight
Now I'm wondering"
You're so conceited
I said that I love you
What does it matter if I lie to you?
I don't regret it but I'm glad that we're through
So don't you tell me that you just don't get it
'Cause I know you
And I know when you're around 'cause I know the sound
I know the sound, of your heart
Well I know when you're around 'cause I know the sound
I know the sound, of your heart
It's not about reciprocation it's just all about me
A sycophantic, prophetic, Socratic junkie wannabe
And there's so much skin to see
A simple Epicurean Philosophy
And you say I'm such a cliche
I can't see the difference in it either way
And we left things to protect my mental health
But you call me when you're bored
And you're playing with yourself
You're so conceited
I said that I love you
What does it matter if I lie to you?
I don't regret it but I'm glad that we're through
So don't you tell me that you just don't get it
'Cause I know you
And I know when you're around 'cause I know the sound
I know the sound, of your heart
Well I know when you're around 'cause I know the sound
I know the sound, of your heart
Well I know when you're around 'cause I know the sound
I know the sound, of your heart
Well I know when you're around 'cause I know the sound
I know the sound, of your heart
Well I know when you're around 'cause I know the sound
I know the sound, of your heart
Well I know when you're around 'cause I know the sound
I know the sound, of your heart
Well I know when you're around 'cause I know the sound
I know the sound, of your heart
Well I know when you're around 'cause I know the sound
I know the sound, of your heart

I have difficulties to find e.g. "fantastic" or "phantastic" in those word.
I was actually looking for Pink Floyd's "Have a cigar" with these words:

… Come in here, dear boy, have a cigar, you're gonna go far
You're gonna fly, you're never gonna die
You're gonna make it if you try, they're gonna love you
Well, I've always had a deep respect and I mean that most sincere
The band is just fantastic, that is really what I think
Oh, by the way, which one's pink?
… And did we tell you the name of the game, boy?
We call it riding the gravy train
… We're just knocked out, we heard about the sell-out
You gotta get an album out, you owe it to the people
We're so happy we can hardly count
Everybody else is just green, have you seen the chart?
It's a hell of a start, it could be made into a monster
If we all pull together as a team
… And did we tell you the name of the game, boy?
We call it riding the gravy train

So I would say that if ChatGPT would use some kind of speech recognition system to find the correct tag data, then it would lead (in this case) to absolutely misleading results.
A footnote: if someone should be tempted to verify my findings, then I can only warn to take the results seriously as ChatGPT also uses the accumulated cache of previous interactions. Consequently, another query with the same prompt may lead to completely different results.
So a collection that got tagged by a (current) LLM will probably be full of surprises.

To re-establish ChatGPT's honor, I have to add that another query "but heard that song being performed by Pink Floyd - do they also have a song with that line?" returns the correct information, a link to the wikipedia article and a link to a video.
But it needs a critical approach which is probably just as time-consuming as checking the files in the classic way.

Yea, I understand the advantage and posibilities of AI and i'm already working with AI support on different things but be aware,

My personal optinion: don't trust AI's !

When the AI's knowledge ends, it naturally goes into fantasizing and that's where it starts getting dangerous. Especially because the AI (e.g. ChatGPT) does not inform you of this point, but simply continues as if it knew what it was doing.

I don't think AI is the holy grail for everything, sometimes good old technology is simply better and safer, even if it's not so trendy.

Btw. it's little bit anonying that almost everything right now need to have AI, even it is just an f****ng CPU fan controller ....(just an example)
Right now, everyone starts using AI whether it works or not. Don't forgett about good old working technics

I would suggest using Claude or Deepseek.

Chat Gpt may have received most of the mainstream attention but different ai are better at different things

even better keep Deepseek on a rented server and give then persistant memory, i am doing this and really getting surprising result. they are pretty cute little book fairies (deepseak can cache load the memory and its better at solving problems. Claude is the best but 30 cents of deepseek tokens cost 10 bucks with Claude)

They can totally be trained to just work on audiobooks and they learn and improve

Be nice to them

:slight_smile:

Who should do the training? Every user for his own collection?