Filter Expression For Non-Ascii Characters

Is it possible to search for all file names or folder names that contain non-ascii characters in them? Currently, I right-click a folder and get a command-prompt. Then I run...

powershell -Command "gci -recurse . | where {$_.Name -match '[^\u0000-\u007F]'}"

and this works. However, it would be nice to be able to get this info in MP3tag if possible.

see e.g. here:

Thanks for this. One example hard codes a bunch of characters to search for. Not what I am looking for. But this example, ARTIST MATCHES "[\xc0-\xff]"... is this doing the same thing as the powershell command?

Have you tried that expression?

Yes, but it is not finding all the ascii characters... for example, ö... and substituting [^\u0000-\u007F] does not yield the correct results.

Did you try this syntax:

ARTIST MATCHES [\x{0000}-\x{007F}]

(I don't know what the range \u0000-\u007F should include, maybe you can explain what do you like to filter? Do you mean all this characters

inlcuding all the control chars?
Or do you mean filter everything except "a-zA-Z0-9äöüàéèÄÖÜ?")

1 Like

I have an mp3 player that cannot display certain non-ascii characters. I want to filter anything that has characters that are NOT in the list 0000 - 007f. Once I can identify which file names have these, I can replace them with the ascii equivalent. For example, I had a folder named, Eydie Gormé. I had to change it to Eydie Gorme. So the filter would just show me all the songs with non-ascii characters. The example ARTIST MATCHES "[\xc0-\xff]" found Eydie Gormé but it did not find another folder name with ö in it.

Or see this thread:

NOT %path% MATCHES ^[a-zA-Z0-9\W]*$ is giving me the same results as ARTIST MATCHES "[\xc0-\xff]"... but searching the entire path.

It seems that ö is being found with "[\xc0-\xff]". It's just that I had that character in some TITLES so I had to change it to TITLE MATCHES "[\xc0-\xff]".

What is still not being detected is Apostrophe character. The single quote ' is in the basic latin block (0000-007F), the Apostrophe isn't. I need the search the find these also and then I can replace them with a single quote.

* MATCHES [^\x{0000}-\x{007F}]

To check in path and all tags.

Are you sure that it is the genuine apostrophe that you are looking for?
See e.g. here:

Yes, the Apostrophe is what I meant. Some of them use the accent in words like "wasn't" or "don't" and these don't work. The only acceptable characters are the ones that fall within the range '[^\u0000-\u007F]'}".

It looks like we have a winner with dano's expression! * MATCHES [^\x{0000}-\x{007F}] is working the way I want. I don't know why I couldn't get the results with LyricsLover's expression... it appears to use the same range syntax. Thanks everyone!

I use various filters frequently. Pulling them from history is okay until they roll off. Is there any way to store filters that are used a lot so they will always be there?

I assume it is not the used unicode range that causes a problem.
With * in front of MATCHES you are looking in ALL tags.
My suggestion was only looking in the ARTIST tag.

So it seems that your apostrophe is somewhere else then in the ARTIST tag.

Update: (18:50 CET)
After reading the documentation again, I think the first ^ character in
[^\x{0000}-\x{007F}]
makes the difference, because it indicates a Negation.
Definition of Negation:

If the bracket-expression begins with the ^ character, then it matches the complement* of the characters it contains, for example [^a-c] matches any character that is not in the range a-c.

*As German speaker I had to translate the meaning of "complement".
Its some kind of "anything other than".

Yes, you can click on the small arrow-to-right on the right end of the filter input box
image
and then on "Manage History".

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.