Ok, yes. Noticed that because today I got some results for stuff that failed yesterday.
Do I understand correctly? If I analyze a file the file's fingerprint and existing data would immediately contribute to the library? And then (probably server side) the system tries to learn and come up with the best / most commonly metadata? In general I like this approach.
My concern is that people probably run this fingerprinting mostly on badly tagged files, thus the initial submission would usually be bad data. I even noticed that: Some of the "bad" files I toyed around with yesterday today showed up when I ran this on proper files. E.g. I had one test file that was named "Enchantmentx", the "x" I added at some point when testing something about changing the title. In my analysis today I ran this across properly tagged files, and the "Enchantment" song was identified as "Enchantmentx", clearly my bad data I contributed.
I guess it will likely now be correct again, as I ran the fingerprinting on the proper file again. But in conclusion this means users need to be motivated to not only run this on their bad tags, but also again on the files that got proper tags. But of course that second time they don't want to get their tags changed again. IMHO there should be a distinction between analyzing files to get metadata and submitting fingerprints + metadata.
I am actually pretty sure this error is triggered if there was just no match, or it's a bug. I get this error for the majority of files (mostly MP3), currently running this over 12k already tagged files. If it happens for one file it happens consistently, retrying does not help. But other files from the same album, that had been encoded and tagged at the same time, then work. Also files that yesterday threw this error work today.
Again I get different results today, but here is the example from yesterday.
Not sure which feedback on data quality is useful for you, but I'm currently running this on ~13k files, and from the results I can already see the following cases:
-
No match (see the error message above). Most of the files, but that's actually expected for a new service
-
Good matches. I would actually count the majority of matches returned into this. It returned the correct artist, album and title without mistakes (not counting in minor spelling differences, or alternative spellings such as "EAV" vs. "Erste Allgemeine Verunsicherung")
-
General correct matches with errors. It identified the correct artist, title and album, but the tag quality just is bad (all lowercase, spelling errors).
-
Really bad data quality. Things like artist being set to "16Die Ärzte13", "" or title "cd02-love is strong", "AudioTrack 08", "no title". Not that many. Simply bad data in user's tags that should eventually even out with enough data I think.
-
Different "album". This is something I think could be a problem, especially for very popular stuff. E.g. I had an ABBA album with songs being identified correctly, but the album tag was different. For some songs it was the actual original album, others were some "Best of" or "Bravo Hits".
-
Complete random mismatches. Actually not that many, except for some notirous cases (see below). But a few interesting ones, such as 歸兮 / Return Journey | ZURIAAKE | Pest Productions being identified as Elton John - Easier to Walk Away (1990) With Lyrics! - YouTube
-
"ÿ" returned as title, album, artist and genre. So far I got 50 of those, all over the place. The response is similar to this:
<?xml version="1.0" encoding="UTF-8"?>
<clientapi><FILENAME>C:\Users\Developer\Music\Library\Zuriaake\奕秋 Afterimage of Autumn\03 冥江 _ River Metempsychosis.mp3</FILENAME><TITLE>ÿ</TITLE><ARTIST>ÿ</ARTIST><ALBUM>ÿ</ALBUM><TRACK>11</TRACK><YEAR>2011</YEAR><GENRE>ÿ</GENRE><BPM>123</BPM><SYNPOS></SYNPOS></clientapi>
Track number and Year differ, BPM differs also but is often 123. This looks like test data to me.
-
Stuff getting attributed to "Better Off Dead" by Motorhead. Yes, this warrants it's own category, as different songs got attributed to this. All are metal songs, so not sure if someone was just submitting a lot of wrong metadata or if the fingerprinting has trouble with dealing with this. A few examples:
This is the Motörhead song: Motörhead - Better Off Dead - YouTube (I don't have it here)
The following songs got matched to it:
All heavy music, but otherwise also pretty different. There were a few more (around 30 in total), but I think the above covers it.
Overall rather good results for the matches. Case 8 could be some issue with the fingerprinting itself, maybe worth for you to investigate. Case 7 looks strange to me, but could be just bad tags.
Bad data quality (e.g. 3 and 4) is something that would concern me. I think your hope is that this will even out, but I stated my fear that it will be primarily be users running this on bad quality tags.
Not sure how to handle the album name issue (case 5) with your model, which assumes a single album title, which probably are different compilations. Maybe you should in addition to the metadata already used by TagRequester also store identifiers to external metadata sources, such as MusicBrainz recording ID. That way data could be cross linked and tagging software could actually query that source for fitting albums.
Actually I think this was what I experienced. Need to check again, but thanks for the details.
Sorry for the lot of text, I hope this is valuable. If not let me know 