convert charset encoding

d_g_4 · January 17, 2013, 7:43pm

Hey all,

Is there any way to convert the encoding of the charset? Right now it appears as gibberish, but it's supposed to be Cyrillic.

Here's a example:

album is: Ôîðòåïèàííûå ìèíèàòþðû
but converted it's: Фортепианные миниатюры

Thanks,

Dan

d_g_4 · January 17, 2013, 8:11pm

Solved it with a bit of manual work and IE browser. Here's roughly how:

export to csv file
open csv file in IE
change encoding in IE to Cyrillic
copy back to a CSV file (requires some jumping through hoops to maintain the structure)
prepare the file for import, e.g. remove unnecessary first and last lines, change semicolon to forward slash (not sure it was needed)
use text file-tag function

dano · January 17, 2013, 8:13pm

Make a new action and choose "Convert codepage"
Select codepage 1251

d_g_4 · January 17, 2013, 8:44pm

Holy crap! How did I miss that!

Works perfectly! Thanks!

i_edgars · April 3, 2015, 5:51pm

How can I see what encoding (UTF-8, UTF-16, windows-1251, etc.) tags registered to any changes [in your program]? It is before any codepage convert changes.

ohrenkino · April 3, 2015, 6:41pm

There is a variable that shows the encoding:
%_id3v2_character_encoding%

i_edgars · April 4, 2015, 5:38am

%id3v2_character_encoding% see only UTF-8, UTF-18 or ISO
http://content31-foto.inbox.lv/albums/i/i.../Dazadi/ID3.jpg
, but not see as: Windows-1251, Windows-1257, ... , KOI-8U etc "Eiropeic" codepage.

How do I see any European character encoding, including the old coding.

ohrenkino · April 4, 2015, 6:16am

AFAIK is this the problem with the old code pages: you have a limited set of character codes and they have to be mapped to a certain appearance of a character.
So the text uses the same character codes but the visual representation depends on the code page. Therefore you probably cannot deduct the code page from a character code.
BTW: the code pages are not a Windows invention.

DetlevD · April 4, 2015, 7:59am

As a starting point ...
http://en.wikipedia.org/wiki/Charset_detection
http://en.wikipedia.org/wiki/Language_detection
http://en.wikipedia.org/wiki/Character_set#cite_note-7

... further on ... maybe ...
http://www.codeproject.com/Articles/17201/...d-Outgoing-Text
http://stackoverflow.com/questions/90838/h...-of-a-text-file
http://www.joelonsoftware.com/articles/Unicode.html
https://msdn.microsoft.com/en-us/library/ie...5(v=vs.85).aspx

... there are hints to use Notepad++ for codepage detection by text example.

DD.20150404.1205.CEST

i_edgars · April 4, 2015, 12:35pm

ohrenkino, DetlevD, thank you very much.

DetlevD · April 5, 2015, 2:38am

This thread is a bit old, but I think the problem keeps coming back from time to time.
Example:
There is some tag-field having this text: Ôîðòåïèàííûå ìèíèàòþðû
Somehow this seems not be the correct character encoding.

Mp3tag has the action "codepage convert", ...
but how would someone know which code page is the right one?
Sometimes one can derive the answer from the context of the media files, or from the album title or other evidences.
But there is no technical assistance from within Mp3tag, but maybe by other software?

Hmm, sure, because it is a rather old question, and this goes back to the Stone Age of DOS, and the implementation of the so called code pages.
Nowadays, there exist some really full-fledged software for the detection of languages.
But I thought, keep it simple for Mp3tag, and I was looking for a brute force method, ...
Then I found something out, what may help.

The idea is, to convert the unknown language "Ôîðòåïèàííûå ìèíèàòþðû" into some other languages, using known codepage conversions.
After the conversion, the user has to decide which is the correct encoding, ... and this way ...
the correct code page number can be retrieved in short time of automated experimentation.
So this is a semi-automatic process of trial and error.

I wrote a DOS batch command script, which utilizes the well known command line tool "iconv.exe".
The tool receives the unknown string and makes the conversion into a specific code page.
This procedure is repeated several times, with different codepages.
At the end there are some results, ... and the user has to decide, which one is the right one, based on the high probability of the test result.
Now with the right codepage number known, it is easy to tell Mp3tag to apply the correct codepage number.

It is a 3-step process.

Put the unknown text into a text file.
Run the batch command script.
Look at the result page, and find the correct target codepage number.

Then this codepage number can be used within Mp3tag action "Codepage convert" to fulfill the correct character conversion.

"Ôîðòåïèàííûå ìèíèàòþðû" ==> CP1251 ==> "Фортепианные миниатюры"

Installer "CheckCP.exe"
http://1drv.ms/1IftYrq

Have fun!
DD.20150405.0638.CEST

If someone need this procedure (step 1 to 3) more often, then the whole process can be summarized and prepared within a Mp3tag export script.
The export script receives from the tag field in question the text value, ...
then generates the batch script, ...
which will start automatically ...
and delivers the desired result, ..
this is the text file with the comparison of the code pages.

All this could be done 'native' by Mp3tag, ...
of course with some more programming effort, ...
so Mp3tag could calculate the correct answer immediately, ...
without the need of a list of different codepages, ...
which have to be examined by user interaction.

DD.20150405.2216.CEST

There is a Mp3tag export script, ...
which starts a codepage check for the tag-field ALBUM ...
~~Export.CMD.20150406.CheckCodepage.ALBUM.mte ( 2.46K ) Number of downloads: 3~~
The cmd batch script has a for loop, using a !variable! instead of %variable%, ...
and the script should work on Win XP machine too, ...
but "EnableDelayedExpansion" is "disabled" by default, this was on Win XP.
For systems after Win XP the "EnableDelayedExpansion" mode may be the default mode.
To be sure, the missing batch code has been added now.
"EnableDelayedExpansion" may also be enabled by starting CMD with the /v switch.
"EnableDelayedExpansion" can also be set in the registry under HKLM or HKCU:
[HKEY_CURRENT_USER\Software\Microsoft\Command Processor]
"DelayedExpansion"= (REG_DWORD)
1=enabled 0=disabled

There is a Mp3tag export script, ...
which starts a codepage check for the tag-field ALBUM ...

Export.CMD.20150406.CheckCodepage.ALBUM.mte (2.79 KB)
DD.20150406.1747.CEST, DD.20150407.1451.CEST

Export.CMD.20150406.CheckCodepage.ALBUM.mte (2.79 KB)

i_edgars · July 4, 2015, 6:19am

%id3v2_character_encoding% see only mp3 tag!
http://content31-foto.inbox.lv/albums/i/i.../Dazadi/ID3.jpg
As I could see any lossless and lossy (flac, ape, wv, alac, wma, ogg) character encoding (codepage)?

DetlevD · July 4, 2015, 7:45am

You may start playing with this Mp3tag script code ...

DD.20150704.1187.CEST

i_edgars · July 4, 2015, 3:47pm

i_edgars · July 5, 2015, 3:17pm

FLAC format codepage always is utf-8?

DetlevD · July 5, 2015, 3:37pm

I do not know for sure ... I have derived the combination FLAC with UTF-8 ...
from the experience that FLAC files are tagged only with Vorbis Comment, ...
which are always UTF-8, right?

DD.20150705.1936.CEST

i_edgars · July 8, 2015, 10:28am

I have some "test" files:
07 - Ace Of Base - All That She Wants - (Singles Of The 90s _ CD) - 1999
flac, ape, wv, m4a, ogg, wma, mp3.
In Mp3Tag options Tags / Mpeg / ticking the ID3v2.3 utf-16th
Record all files, but changed only mp3 file tag codepage!
Is this normal? Other files (without mp3), it also has to remain with codepage utf-8!

DetlevD · July 8, 2015, 10:59am

Do not interchange the terms "character encoding" and "codepage".
Sorry, I do not understand, what is the basis and theme of your question?
The values in the picture's listview column, named "Codepage", seem to be correct, when we are speaking of character encoding.
The column is calculated by the formatstring, which we have discussed before, right?

See also ...
https://en.wikipedia.org/wiki/Code_page
https://en.wikipedia.org/wiki/Character_encoding

DD.20150708.1524.CEST

i_edgars · July 8, 2015, 1:09pm

Sorry, I'm not an IT professional, so I do not know the specific terms and confuses them.

With Mp3tag program tag character encoding (correct ??) [the values in the picture's ListView column, named "codepage"] from utf-8 can be changed to utf-16 only into *.mp3 files?
With Mp3tag program tag character encoding of UTF-8 to UTF-16 into *.flac, *.wv *.ape, *.m4a, *.ogg files can not be changed?

DetlevD · July 8, 2015, 1:20pm

Don't worry, but keep it in mind.

So far I know it, these media filetypes resp. the text in the tags within, are bound to UTF-8 character encoding.

You have to look into Tools\Mp3tag\Options\Tags, ...
there you can change the behaviour of tag writing, ...
especially for the media type MP3 there are different options available.

DD.20150708.1720.CEST