Accessing new acoustic fingerprinting db with mp3tag

MP3Freak_Peter · February 2, 2021, 9:15pm

You'll need

mp3tag
tagrequester (api module from tagcomplete)
ffmpeg (to process any audio/video format)

First step is to download and install tagrequester plugin/api
from http://www.peter-ebe.com
plugin
After installation finished you need to start TagRequester, it remains as tray application waiting to process your local requests.
(The title recognition is done locally, only metadata is synchronized with the central database.
The necessary part of the database is downloaded automatically. Full download is only nessesary first time it has been started. Next time only new parts will be dowloaded.)

Second step is to configure mp3tag tools
Click options > Tools > New

5387f8c64aba32925ce54c2ffdba219a8b515c6a

```
Name: 	Autotag 
```

Path:	C:\Program Files\TagRequester\TagRequesterClientGUI.exe

```
Parameter:	"%_path%" 
```
" for all selected files" can be checked to run on multiple files

How to use?
RightClick files from mp3tag -> Tools > Autotag
An Request Window will appear. You can copy single Tages or Apply full Tag on Mediafile unsing the "Apply Button"

Optional:
Download ffmpeg from Download FFmpeg
and configure tagrequester. You can use preset button and adjust path to ffmpeg if you need to process flac, ogg vorbis or video files. Would be a good idea to do so.
TagRequester_Decodersettings1

Have fun

MP3Freak_Peter · February 2, 2021, 9:55pm

Another Client that is shipped with TagRequester is "TagRequesterCMD" that can be used in batch processing from scripts or third party software.

Call:
TagRequesterClientGUI Absolutpathtofile\file (unc or drive:path)

Return:
response
or
response_fehler

Third Party Software can even access TagRequester's API directly, have a look in documentation on german page for more details.

phw · February 3, 2021, 4:11pm

As I'm always interested in stuff like this I gave this a little test run. To make it convenient for me to use and to understand the working better I cobbled together a small plugin for Picard.

First off: It worked really well so far, good work. I have not yet done any extensive tests on many different files, but the few files I tested all were found by TagRequester, which impressed me. I will do a more extensive test later.

From the description it is not quite clear to me whether the process is completely offline or not. The description says it is doing the recognition locally, but it also talks about the need to submit fingerprints and metadata to the server.

It also talks about a community maintained database, but how does one contribute to this database?

Maybe just a few random notes from my experience playing with this so far:

The downloaded database is 2.4 GB in size. Not sure how many fingerprints this contains or how this would scale in the future.
This seems to assume that e.g. there is only a single possible ALBUM for a given fingerprint. Not sure how this would deal with different data submitted for the exact same audio.
I found it to be comparatively slow, at least compared to e.g. AcoustID. While I get a response from AcoustID in under a second, TagRequester needs several seconds for a single file. I had expected a pure local lookup may be faster than a web request. But of course AcoustID for sure is very much optimized to handle larger loads. Overall I think lookup speed is fine for this use case.
One seems to be able to mess up the identified metadata with local data. I had two copies of the same file: One with correct metadata, one with wrong metadata (I have plenty of those for testing). I first ran TagRequester on the one with wrong title in metadata. The result showed the wrong TITLE. Than I ran it on the file with correct title. The first time I did this it showed me again the wrong title as if I had somehow updated the database with this title. Later on after having used the correct file for more testing it went back to show me the correct title again for both files. I could not reproduce this later on, if I can I'll let you know.
The command line client has trouble resolving relative file paths, thus running e.g. TagRequesterClientCMD .\myfile.mp3 fails. You always need to specify full path.
I had trouble setting up ffmpeg, until I realized I need to restart TagRequester after having configured ffmpeg as decoder. Maybe a small hint or auto restarting the service would help here.
This being available as a rather opaque binary blob for Windows only pretty much limits the usefulness of this for me personally

MP3Freak_Peter · February 3, 2021, 9:07pm

Hi,
thanks for feedback and picard plugin - cool!
I'll try to give some answers.

Local part: Decoding, acoustic fingerprinting algorithm, identification of fingerprint in local database to file identifier.
Remote part: metadata and identifier exchanged with remote database, example from my postman:
(no personal information, just metadata)

new replication & data compression planed within next versions
speed fluctuates, bottleneck is decoding. Tested from local ssd, takes around 1-2sec per file. May be slower from mechanical disk or network e.g. wifi. edit: tested on my notebook, almost same speed local to (fast)network but energy management slow down processing a lot in battery mode. But thanks for note, I'll make some tests with different file formats /edit
edit: If bottleneck is singlethread performance, you can process several files in parallel, TagComplete does the same to increase performance, processing multiple files per second /edit
the intention is to do as much as possible locally in order to achieve high performance and to relieve the backend and ensure privacy
Backend is still learning.
yea, changing configuration will need restart
closed source project but maybe add an debug view to show communication with backend (example see above) but anyone can analysed by using a simple proxy if interested in

phw · February 4, 2021, 4:18pm

Thanks for the responses. So my biggest questionmark is still that I don't understand how I can contribute to the database. Let's say I have a file and run the fingerprinter on it it will generate the fingerprint and search for matches in the local DB, right? Will it always also submit metadata to the database? What is if it does not find any match, will it submit the metadata of the file (which probably is bad, otherwise I would not have run the fingerprinter)?

I also tried this now on my proper test environment, but I had less success with good matches this time. Got the following message for a lot of files:

An unknown error has occurred: File does not exist or file type unknown C:\Users\Developer\....mp3 Webserviceaufruf

First I though my path would be wrong, but I actually think this message just tells me there was no match? Some other very similar files (sometimes from same album) then returned proper data. So I think the error message is a bit misleading here. Also again I had not so much test data available and a lot of what I threw at TagRequester (roughly 100 files) was a bit obscure, so I did not really expect TagRequester to find much.

I also got some bad but interesting mismatches. E.g. this song 歸兮 / Return Journey | ZURIAAKE | Pest Productions got identified as Elton John "Easier To Walk Away". Other songs on the same album got attributed to Motörhead. Can't tell whether this is because of bad metadata in the file or maybe there is indeed a fingerprint collision.

Maybe I just don't know much about Windows named pipes or this is Python specific. But when I actually had multiple threads in parallel I got errors, so I made it query one file after another in the same thread. But my main goal was to get this plugin quickly to make use of it But actually I think it works well like this for now, could be improved later.

Tried with Fiddler, without success. But has been over 10 years since I last used this. Maybe I should give postman a try.

MP3Freak_Peter · February 5, 2021, 9:43am

Yea, local decoding, fingerprinting and identification. Metadata is exchanged with remotedatabase.
"The system learns with each user who scans their holdings."
Not described exactly how it's working in any condition But it's working.

For now "public" version do not allow manual editing/correction of data but will be added, but in TagComplete first.

Says that decoding and analysing of file has failed.
Can you offer a (small) sample file where that error message occurs?

Well,Elton John and Motörhead? Did you heard into both songs?
Probably rather bad metadata, but would be interesting to check if you can offer sample.

Pipes allow multithreading use if pipeserver does. (TagRequester does)
You can run multiple Clients (mp3tag tool checkbox "for all selected files"

But you need to read in msdn how to use. Check result after connecting pipe. you may get an short "busy" if you try to connect to fast cause pipeserver may need an ms or two to setup next io handler instance. something like: while busy or timeout do result = tryconnectpipe sleep(1)

Backend API wouldn't be useable for you so don't spend to much time on it. I already posted an sample screenshot above what is transfered

phw · February 5, 2021, 5:00pm

Ok, yes. Noticed that because today I got some results for stuff that failed yesterday.

Do I understand correctly? If I analyze a file the file's fingerprint and existing data would immediately contribute to the library? And then (probably server side) the system tries to learn and come up with the best / most commonly metadata? In general I like this approach.

My concern is that people probably run this fingerprinting mostly on badly tagged files, thus the initial submission would usually be bad data. I even noticed that: Some of the "bad" files I toyed around with yesterday today showed up when I ran this on proper files. E.g. I had one test file that was named "Enchantmentx", the "x" I added at some point when testing something about changing the title. In my analysis today I ran this across properly tagged files, and the "Enchantment" song was identified as "Enchantmentx", clearly my bad data I contributed.

I guess it will likely now be correct again, as I ran the fingerprinting on the proper file again. But in conclusion this means users need to be motivated to not only run this on their bad tags, but also again on the files that got proper tags. But of course that second time they don't want to get their tags changed again. IMHO there should be a distinction between analyzing files to get metadata and submitting fingerprints + metadata.

I am actually pretty sure this error is triggered if there was just no match, or it's a bug. I get this error for the majority of files (mostly MP3), currently running this over 12k already tagged files. If it happens for one file it happens consistently, retrying does not help. But other files from the same album, that had been encoded and tagged at the same time, then work. Also files that yesterday threw this error work today.

Again I get different results today, but here is the example from yesterday.

Not sure which feedback on data quality is useful for you, but I'm currently running this on ~13k files, and from the results I can already see the following cases:

No match (see the error message above). Most of the files, but that's actually expected for a new service
Good matches. I would actually count the majority of matches returned into this. It returned the correct artist, album and title without mistakes (not counting in minor spelling differences, or alternative spellings such as "EAV" vs. "Erste Allgemeine Verunsicherung")
General correct matches with errors. It identified the correct artist, title and album, but the tag quality just is bad (all lowercase, spelling errors).
Really bad data quality. Things like artist being set to "16Die Ärzte13", "" or title "cd02-love is strong", "AudioTrack 08", "no title". Not that many. Simply bad data in user's tags that should eventually even out with enough data I think.
Different "album". This is something I think could be a problem, especially for very popular stuff. E.g. I had an ABBA album with songs being identified correctly, but the album tag was different. For some songs it was the actual original album, others were some "Best of" or "Bravo Hits".
Complete random mismatches. Actually not that many, except for some notirous cases (see below). But a few interesting ones, such as 歸兮 / Return Journey | ZURIAAKE | Pest Productions being identified as https://www.youtube.com/watch?v=CQyfppzKgNY

"ÿ" returned as title, album, artist and genre. So far I got 50 of those, all over the place. The response is similar to this:

<?xml version="1.0" encoding="UTF-8"?>
<clientapi><FILENAME>C:\Users\Developer\Music\Library\Zuriaake\奕秋 Afterimage of Autumn\03 冥江 _ River Metempsychosis.mp3</FILENAME><TITLE>ÿ</TITLE><ARTIST>ÿ</ARTIST><ALBUM>ÿ</ALBUM><TRACK>11</TRACK><YEAR>2011</YEAR><GENRE>ÿ</GENRE><BPM>123</BPM><SYNPOS></SYNPOS></clientapi>

Track number and Year differ, BPM differs also but is often 123. This looks like test data to me.

Stuff getting attributed to "Better Off Dead" by Motorhead. Yes, this warrants it's own category, as different songs got attributed to this. All are metal songs, so not sure if someone was just submitting a lot of wrong metadata or if the fingerprinting has trouble with dealing with this. A few examples:

This is the Motörhead song: https://www.youtube.com/watch?v=x3pJF8iDM5c (I don't have it here)
The following songs got matched to it:
All heavy music, but otherwise also pretty different. There were a few more (around 30 in total), but I think the above covers it.

Overall rather good results for the matches. Case 8 could be some issue with the fingerprinting itself, maybe worth for you to investigate. Case 7 looks strange to me, but could be just bad tags.

Bad data quality (e.g. 3 and 4) is something that would concern me. I think your hope is that this will even out, but I stated my fear that it will be primarily be users running this on bad quality tags.

Not sure how to handle the album name issue (case 5) with your model, which assumes a single album title, which probably are different compilations. Maybe you should in addition to the metadata already used by TagRequester also store identifiers to external metadata sources, such as MusicBrainz recording ID. That way data could be cross linked and tagging software could actually query that source for fitting albums.

Actually I think this was what I experienced. Need to check again, but thanks for the details.

Sorry for the lot of text, I hope this is valuable. If not let me know

phw · February 5, 2021, 5:11pm

Noticed something: If I have a file which gets identified with different tag data then what's in the file already I can just run Tag Requester multiple times on this file. At some point I can convince TagRequester that my local tag data is the data it should use. This confirms how I think the service works, but I'm not sure this is really intended behavior.

I tried to be a good citizen and fix my bad submission I did to test this again by afterwards submitting the correct data again until Tag Requester switched to using the correct data again.

MP3Freak_Peter · February 5, 2021, 8:24pm

Thanks for comprehensive feedback.

yes, about that. server side is still under development.

not triggered when there was no match, I'll check if this is an bug.

"ÿ".. and such nonsense should be filtered, i'll check.

Tag Complete Database should stay an independent database, will be an hard way but I am confident that it is possible

don't try to bombard the service with wrong data, that's not useful for the community;)

phw · February 5, 2021, 9:53pm

That's why I ran this on 13k well tagged files. That should be useful.

But honestly, running this on wrong/bad data is exactly what I would expect most users do. Why else should I run such a service on my files if not when I have really bad or even no tags at all? Maybe it takes me some time where I try to identify that "Track 1" and I even run Tag Request more than once.

And why should I run it again after I have cleaned up my tags? I'm pretty sure far less users will do this, only a few who understand how the system works and who want to contribute back.

I'd really consider to split pure identification from submission. Have one action that just fingerprints and gets tags, and another that fingerprints and submits. Or if you still want to get all data submitt3d maybe have two confidence levels: Normal is low, but when run in submission mode this data is treated with higher confidence.

MP3Freak_Peter · February 5, 2021, 10:19pm

the internal rating system is still under construction, there are several internal rating mechanisms that are still being improved. who know if someone - just cause he switches in an submission mode is more confidenceable than another.
Main Application is still TagComplete to collect data with higher confidence. For example I can use user's choice in TagComplete to raise or decrease rating.
TagReqeuster is just for external access. Maybe should be set to lower confidence.

phw · February 6, 2021, 12:36am

Yes, because Tag Requester as I understand is for integration of third party components, and those can make the distinction.

You say "don't try to bombard the service with wrong data". That currently makes Tag Requester only useful for one use case: Submitting well tagged files to the database. If I use it I must ask users to only run it on good data. And the reason to do this would be to contribute to a database they otherwise do not benefit from.

But for users the most interesting use case is to identify their unknown songs and get tags for them. So they naturally want to run this on all their "Track 1" of album "unknown" files.

If there would be a pure lookup mode I could e.g. in the Picard plugin make the distinction of offering the pure lookup on not already tagged files, but on files the user has already tagged in Picard it would run in submission mode.

The lookup mode would either do no submission at all or at least with very low confidence.

MP3Freak_Peter · February 7, 2021, 3:44pm

yea, backend filter can handle In first instance there is an blacklist with keywords, but it's an little more complicated to find out if it's just an bad word within a title or an bad/fake title.
"Rank 1 - Passage To The Unknown" is ok, "Rank 1 - Unknown artist" not, "title unknown" not.
Yea, possible to find out but need to keep on working.
("EAV - Jonny porno" is an valid title even "porn" would be on filter list and so on... )

Btw: I've realased and little TagRequester update for most importand findings.

fixed incorrectly returned error message
improved error handling in general, more detailed error messages
warn when closing TagRequester if local client connections are still open
(helps you to see if you really closing all pipes when connecting to tagrequester. Pipe count is only limited by system but TagRequester could get instabele if you never close a pipe)
Configuration: corected decoder link and messages on configuration changed
Installer: removes the explorer->send to link when uninstalling

[edit]:

The backend also got an update today in the filter rules and behaviour to further increase the quality of the metadata.

phw · February 8, 2021, 8:17am

I tested the update. Error message are now gone, also invalid files that cannot be decoded (I have a couple of zero length files I commonly use for testing if I care only about metadata handling) generate a separate but proper error message .

I now also better understand your concept of always submitting metadata but filtering out the real bad metadata. I still think dealing with the metadata submitted on first runs on badly tagged files will cause problems, but I'm happy to be proven wrong

The other bigger concern from a technical point of view I have about scalability of the local database once people actually submit data, but I think that is an issue that can be more or less easily be dealt with if it becomes problematic (e.g. only partial DB locally with online lookup if no match is found, or complete online lookup).

I'll see that I can get the Picard plugin to handle parallel access to the pipe properly.

phw · February 8, 2021, 8:38am

Forgot to ask earlier, but what is the SYNPOS tag the service supports?

MP3Freak_Peter · February 8, 2021, 9:38am

Sure, there would be some more/less difficulty problems to be solved in future, but, still optimistic

"Public"User Rating is planed within next versions of TagComplete, better than 1000 monkeys in cellar

SYNPOS "Position synchronisation" should be POSS Frame id3v2.3.0 - ID3.org
Used in MediaArchive, formerly mp3find, for crossfading synchonisation.

phw · February 8, 2021, 4:58pm

I updated the TagRequester Picard plugin. It now handles the parallel access to the pipe from multiple threads as you had suggested. Retrying the connection indeed did the trick. It works pretty well so far.