Lack of global ASCII / filename sanitisation in Mp3tag

Hello,

I am writing to raise a design issue with Mp3tag’s filename handling that has become increasingly problematic in modern usage.

At present, Mp3tag has no global option to enforce safe, ASCII-only filenames or to automatically sanitise Unicode and illegal characters when filenames are created or updated. Instead, users are expected to anticipate every problematic character in advance and manually encode replacement logic into Tag → Filename format strings using chained $replace() calls.

This approach does not scale and is fundamentally brittle.

Unicode punctuation, symbols such as ×, smart quotes, language-specific characters, and other non-ASCII glyphs routinely appear in tags sourced from streaming services, Bandcamp, Discogs, and user submissions. Requiring users to foresee all of these cases is unrealistic, especially when Windows compatibility, DJ software, archival workflows, and cross-platform transfers are involved.

While $validate() removes Windows-illegal characters, it does not address non-ASCII characters, and there is currently no built-in function or preference to enforce ASCII-only filenames, nor a simple “normalise to safe filename” option.

In practical terms, this means:
• Filenames can silently break downstream tools despite appearing valid in Mp3tag
• Users must repeatedly refine ad hoc replacement logic
• The burden of filename safety is placed entirely on the user rather than the application

What is missing is a simple, global, opt-in mechanism such as:
• “Enforce ASCII-only filenames”
• “Automatically normalise Unicode punctuation to ASCII”
• Or a single function that safely transliterates or strips non-ASCII characters without manual enumeration

This is not an edge case. It is now the default reality of modern metadata.

Mp3tag is otherwise an excellent and careful tool, which makes this omission more conspicuous. Filename safety should not depend on the user knowing in advance which characters might appear next month, next year, or in another language.

I am raising this not as a support question, but as a feature gap that directly affects reliability and long-term archival correctness.

Thank you for taking the time to read this. I hope you will consider addressing this at the application level rather than leaving it to increasingly fragile user-side workarounds.

Regards,

Fezz

I moved this to General Discussion as that is the section for feature requests.

It is already possible to delete all non-ASCII characters in a filename using a regular expression like this:
$regexp(%_filename%,'[^A-Za-z0-9 _\-.]',)

This would leave only uppercase and lowercase letters from a-Z,
numbers from 0 to 9,
a space,
an underscore,
a minus
and a dot.
All other (non-ASCII) characters including Emojis would be replaced with "nothing" = deleted.

If you like to replace the not accepted characters with something like an underline character, you can set it after the last comma like this:
$regexp(%_filename%,'[^A-Za-z0-9 _\-.]',_)

I agree and there’s several topics related to the subject:

Strip all accents from accented characters - Support - Mp3tag Community

Regular expression to filter path names with diacritics - Support - Mp3tag Community

Feature Request: Filter option matching diacritics or not - General Discussion - Mp3tag Community

Unicode to ASCII - General Discussion - Mp3tag Community

This would be great, same with your other suggestions as the same thing packed into an Quick Action would be simpler for those not familiar with coding.

Even a simple conversion of diacritics or symbols to a similar matching A-Z format would suffice. There is the matter of other languages though, does Japanese get translated into romaji? Does everything get anglicised?

That being said, Most operating systems surely support these characters by now, it’s reserved characters like ? \ : / that seem to be the issue. @LyricsLover’s got the right answer currently but it’s insane the amount of research and $replace() functions that would be needed to accommodate every potential character.

I made an Action for myself that does a fair plonk of these conversions, applies it to SORT tags then uses them to construct a filename based on media type. It’s not going to be suited for everyone but it might help you in the meantime.

Fix Tags.mta (23.5 KB)

I understand the request, but an opposite argument appears as well. If these changes happened automatically, there would be an equal number of users posting here that mp3tag was dropping or changing characters from their expected naming.

At least in the current way, there are solutions that can be integrated into an Action. This can be applied at any time. But having it as a default mode would be more difficult to "undo" for the users that don't want it for whatever reason.

Making it an option is something to consider. But it's function would have to be very clear, and off by default.

This is very important and why I think any (semi-)automatic defaults can be disastrous. Simply deleting the characters will end up with filenames looking like .mp3. I.e. just the file postfix, or maybe just underlines if one chooses that route. But MP3Tag could possibly include more functions than the ansi()conversion, by exposing more of the the OS libraries to deal with various Unicode characters. Or for some users could a hash function be used for the filename, most software will identify by the tag data anyway. In all circumstances whichever way it must be optional and not the default.