Unicode to ASCII

I have created six actionsgroups for the purpose of transcoding UTF-8 Unicode characters into 7-bit ASCII characters (see attached MTA files).

Each group of actions encloses a special set of Unicode characters, so that you are free to combine what replacement presentation you need.

The transcoding scheme is derived from the script 'uni2ascii.c', version 2008-08-30T20:27:37, by William J. Poser (billposer@alum.mit.edu), including some small changes by me.

I do not know if the MTA files will do their work error free in all cases, this task to check it out is to you.

UniToAsc.Diacritics.mta (8.64 KB)
UniToAsc.Enclosed.mta (3.52 KB)
UniToAsc.Equiv.mta (1.22 KB)
UniToAsc.Expand.mta (2.21 KB)
UniToAsc.Style.mta (12.5 KB)
UniToAsc.Symbols.mta (1.53 KB)
20090807.Transcoding_scheme_from_Unicode_to_ASCII.pdf.zip (153 KB)
If there is some quirk in the transcoding scheme or if you have found out a neater transcoding scheme or if something otherwise is missing, then feel free to come back and report.

DD.20081003.1700.CEST
Fixed some few quirks.
Added rtf document for quick overview of character replacements.
DD.20081005.2237.CEST
Changed rtf document for quick overview of character replacements.
DD.20081008.1317.CEST
Changed rtf document to zipped pdf document.
DD.20081008.1557.CEST
Changed one error regarding character "a" in UniToAsc.Style.mta and related pdf document.
History:
UniToAsc.Style.mta ( 12.45K ) Number of downloads: 35
Transcoding_scheme_from_Unicode_to_ASCII.pdf.zip ( 151.59K ) Number of downloads: 139

UniToAsc.Diacritics.mta (8.64 KB)

UniToAsc.Enclosed.mta (3.52 KB)

UniToAsc.Equiv.mta (1.22 KB)

UniToAsc.Expand.mta (2.21 KB)

UniToAsc.Symbols.mta (1.53 KB)

UniToAsc.Style.mta (12.5 KB)

20090807.Transcoding_scheme_from_Unicode_to_ASCII.pdf.zip (153 KB)

2 Likes

Thank you very much. Great job.

Thank you, Stevest.

See updated post #1 at top of the thread.

DD.20081005.2245.CEST

Thank you again, you made my life easier. The new included rtf is very very useful to see the differences between the scripts.

The transcoding overview now uses a proper unicode font and has been changed from rtf document to a zipped pdf document.

DD.20081008.1604.CEST

Just downloaded and tried (diacritics) this. Seems it doesn't work?

Could you specify what does not work?
Have you filled the field uni_to_asc?

I’ve imported the .mta file into my actions folder, then triggered the action.

It runs but doesn’t do anything.

This does not anwer the question

What would you expect should happen?
It would be worthwhile to have a look at the action and see what it does - and then adapt it according to your needs or adapt your proceedings to the action.
In this case: the action treats a field called uni_to_asc. Either you fill it or you rename the field in the action.

Thanks - I understand now. I’ll change the field that the action gets applied to to %_filename% and try that.

All good.

Works perfectly!

UniToAsc2.Style.mta (19.3 KB)

Hi,
when I executed the action UniToAsc.Style, it stopped with a plenty of errors. When I took a look in the file UniToAsc.Style.mta, I spotted that some replaced characters are written like the encoding used should be UTF-32. I think it was the reason of these errors. I made my own action that should do the same for UTF-16 encoding. I made a program in python that created this action using the UniToAsc.Style.mta file. For those who are interested, its source code is:

cislo=int("1d400",16)-int("dc00",16)
with open("c:\\Users\\honza\\AppData\\Roaming\\Mp3tag\\data\\actions\\UniToAsc.Style.mta","r",encoding="ascii") as ss:
    radky=ss.readlines()
    radky=radky[:249]+[radky[249][:2]+"(?#E)"+radky[249][2:]]+radky[250:]
    i=225
    while (i<len(radky)):
        kody=list()
        z=16
        while (radky[i][z]=="}"):
            z=z+9
        pr=z-4
        while (pr<len(radky[i])):
            kody.append("(\\\\x{D835}\\\\x{"+hex(int(radky[i][pr:pr+5],16)-cislo)[2:].upper()+"})")
            pr=pr+10
        radky[i]=radky[i][:z-8]+"]"+"|"+"|".join(kody)+"\n"
        i=i+6

with open("c:\\Users\\honza\\AppData\\Roaming\\Mp3tag\\data\\actions\\UniToAsc2.Style.mta","w",encoding="ascii") as ns:
    ns.write("".join(radky))
1 Like

A post was split to a new topic: Replace diacritics within folder names