Feature Request: Unicode Support for Export to Batch Files

Could we please have a $filename("path",unicode) export encoding?

When creating Batch Files, neither ansi nor utf-8 or utf-16 work when exporting a file path, like in:

$filename("test.bat")$loop(%_path%)echo "%_path%"

For a real path of
M:\MP3\Tagged\Köllen, Achim\Köllen, Achim - Eine ganze Nacht mal (nicht vernünftig sein).mp3
this will display
M:\MP3\Tagged\K÷llen, Achim\K÷llen, Achim - Eine ganze Nacht mal (nicht vern³nftig sein).mp3

Even worse (because »č« isn’t in the ANSI character set for Germany (CP-1252)):
The real file path
M:\MP3\Tagged\Robič, Ivo\Robič, Ivo - Mit 17 fängt das Leben erst an.mp3
M:\MP3\Tagged\Robic, Ivo\Robic, Ivo - Mit 17 fängt das Leben erst an.mp3
in the export file, which in turn will display
M:\MP3\Tagged\Robic, Ivo\Robic, Ivo - Mit 17 fõngt das Leben erst an.mp3
in the »DOS Box«!

Of course this is due to the ANSI character set not being translated into the OEM character set needed by DOS Batch Files. One could now try to convert into the OEM character set (which will fail for example 2), but why not use the command processor’s ability to handle what they call »Unicode«?

I believe this to be actually a UTF-16 LE (no BOM) format, but I’m not 100% certain.

So I kindly ask if we could please have something that will write out in a (Unicode) format compatible to the system’s batch file processor?

Sometimes it seems unavoidable to come up with fancy workarounds for some things—in this case, a function like this would be a great help! (See an example here.)

Okay, forget it …

I just did some debugging and found that Windows’ command processor CMD.EXE will under no circumstances accept Unicode or UTF format Batch Files. Interestingly enough, the TYPE command as well as direct command input will allow Unicode (UTF-16 LE with BOM)! Seems that commandline Unicode support is rather limited … :angry:

In order to work around the Batch File workaround, I have come up with a little »trick«—actually the »best« solution for creating batch files we can get at the moment:

It appears that TYPEing a »UTF-16 LE with BOM« encoded textfile is possible. All Unicode characters that are in the current system’s codepage will be converted, plus some other approximations done (i.e., a »č« which is not in CP-1252 will be converted to a »c«).

So we can use MP3Tag to export into a UTF-16+BOM text file and afterwards use a little Batch File to have it converted and actually executed!

Let’s say we use $filename("C:\Programme\MusicIP\MusicIP Mixer\mmm-data.txt",utf-16) in our Export configuration and set [x] Write BOM in the dialogue, then perform the export.

We then create a little Batch File called mmm.bat in the same directory and place a shortcut to it on our Desktop for further use:

@echo off
type mmm-data.txt > mmm-data.bat
cmd /u /c mmm-data.bat
del mmm-data.bat

This will TYPE a copy of mmm-data.txt (the above-generated MP3Tag Export File) into a new file called mmm-data.bat, thus doing any possible conversions and creating the actual batch file containing the commands you specified in your Export Configuration.

We then invoke a command processor to execute whatever commands are in the Batch File, and afterwards delete the mmm-data.bat file we just created.

A little awkward just to overcome yet another Windows problem, but alas!

:smiley: You are the Workaround King.
I gave up using unicode characters at the very beginning of my Retag-Your-Music-Library-Project (especially because hardware problem - I mean portable players). But now I am thinking of a script replacing strings with the unicode version, artists' names at least. Workarounds you've done could be useful, thanks for sharing. Anyway, I think I won't use unicode characters in the filenames and paths.

Hi Moonbase, my MySpace friend, how are you?

Yes, the TYPE command can type unicode textfiles.
What the other commands understand and can work with this stands on another (code)page.

Clever approach.
Is there a way back too?

Have you ever tried this tool?

Microsoft AppLocale Utility

Me I'm not so far, but I will do when having time to do, in the meantime you may try this tool and report to us?

It seems that always to be in the international standard is antithetical to the use of local characters.


Working with about a zillion different tools and databases plus having to see that stream titles are shown correctly even for listeners out there with older players, I found that—sadly enough—there isn’t yet enough »working« Unicode support around to be able to actually use it »straight through«.

So, for all production purposes, I have set my »standards« as (good quality) MP3s using solely IP3v2.3 and ISO-8859-1 encoding. For me, currently the best compromise to be compatible with »almost everything«, be it old broadcasting software, stream titles or just portable players. Here in Germany, this is a thing »you can live with« since most European characters can be displayed using ISO-8859-1. If one had more cyrillic, Japanese or Chinese titles, this would still be a problem, of course.

So I use kind of a »minimal subset«—the transition upwards (converting ISO to Unicode) later on will be a snap—just resave everything. Whenever there is full support in all the »other« software, that is.

Well, let’s hope for more UTF-16 and UTF-8 (ID3v2.4!) support all over …

This I can’t avoid since I rely on using MusicBrainz which in turn uses »real«, i.e. correctly spelled, artist names (»Robič, Ivo« instead of »Robic, Ivo«) which eventually end up as part of the path names here. And to be honest, I didn’t care until I found out about this batch file stuff, because all my filesystems can handle Unicode (both on the Windows and Linux machinery).

@DetlevD: wink

No time to go back to »old codepage times«, if I can anyhow avoid it … All this »finding out about errors other companies made« for free takes up enough time alreaday … sigh So no more stuff like »App Localizer« here. I simply expect a modern OS to handle Unicode.

And no, there can’t be a safe »way back« since recoding from a »larger« character set into a »smaller« will always be a one way. (How would it know if »c« meant »c«, »č«, »ç«, »ĉ«, »ċ«, or »ć«?)

Yes, there is no problem using the files on my computer (however, my system use ISO-8859-2/Windows 1250, so I have to be careful here in Central Europe, for example some French characters cause problems sometimes). The weak points are hardwares again. I don't like underscores, or spaces in my displays. I agree with you, I use IP3v2.3 and ISO-8859-1 in tags, because that't the only type, that is supported by ALL of my hardwares. The cause I plan to upgrade to Unicode in tags, because I archive my files almost exclusivly in FLAC. And I have no equipment supporting it. So I might as well use Unicode in them. :slight_smile:

Me too. :smiley:

For info:

When you want to export "CSV" files in Unicode format and open it with MS EXCEL you have to write the file in UTF-16 (little endian) incl. BOM header, value separator must be tabulator (values could be enclosed by quotes).

MS EXCEL calls this "Unicode Text" (and not "CSV") and saves it with the file extension ".txt" by default (see "Save As") which isn't really good because ".txt" is joined with my editor program and not with MS EXCEL. :frowning:

When you save it as "Unicode Text" but with the file extension ".csv" - you can double click on the file and MS EXCEL is intelligent enough and detects this "Unicode CSV Format" and opens the file correctly.


PS: Tested with MS EXCEL 2003 (maybe 2007 could more).

MS EXCEL uses as value separator the regional settings from the system which is different in all countries.

e.g. when you receive a .csv file from USA (where "," is the default separator) you can not open it with double click under MS EXCEL - but when you change the regional settings then it works.

I've just played with the csv export under mp3tag and noticed that it is written as UTF16LE+BOM using the value separator ";" (taken from the regional settings or hardcoded in mp3tag?).

It would be much better mp3tag would use the tabulator as value separator when writing UTF16LE+BOM files - also quotes as value delimiter - this would work with MS EXCEL (see previous posting).


PS: How to set the tabulator as value separator in the regional settings? :wink: