XY, character sets, and cmd.exe

Please check the FAQ (https://www.xyplorer.com/faq.php) before posting a question...
Post Reply
Dustydog
Posts: 321
Joined: 13 Jun 2016 04:19

XY, character sets, and cmd.exe

Post by Dustydog »

I'm an American and know too little about character sets and command line programs - especially where I don't need to see the output at a command prompt, just let a utility talk with XY.

My main reason for this is my love of classical music and weird results I get between various programs.

After some research, it seems like the following things are true. I'd appreciate some corrections and additions.

- cmd.exe understands more about different character sets than it can actually display (at least without jumping through some hoops first), regardless of what region is set, as long as you have a current version of Win 10 due to improvements MS has made.

- Apparently, UTF-8 is still a problem as the current command line uses 16-bit small-endian encoding natively? Is this just a problem with oriental languages? Or is it a problem with Western Europe as well?

- Regardless of Windows, the various programs still need to support the command line correctly. Apps like MediaInfo, which is UTF_8 friendly, and perhaps more Linux oriented, in particular seem to have additional problems?

- The root of the problem is actually quite a good thing: Backwards compatibility for Windows programs all the way into DOS days in some cases.

Now for some specific weirdness I've encountered:

For some Vorbis Comments, and some odd formats, MediaInfo is more specific and thorough than XY. I've largely managed to work around my problems between the two, but I still have some issues, particularly if I call it directly through 'runret' via the command line.

With MediaInfo command line, diacritical marks from Vorbis Comments are passed through the DOS 437 character set, which really surprised me. So, for example, when the metadata is for: Eugène Gigout, it comes out as: Eug├⌐ne Gigout.

To make matters worse, in order to maintain those characters, a script needs to be in, for example, UTF-8 (which is what XY uses natively, doesn't it?), but a script that does a simple search/replace for those characters, gets scrambled when encoded in UTF-8 for those characters, but works fine if it's entered as a user script within the program.

I haven't tried actually converting my standard script encoding to something other than UTF-8 as I'm still in the head scratching stage, but it would be easy enough to do. So, should I encode my scripts in a different character set? And if so, which one? UTF-16 little endian? MediaInfo isn't the only utility I use via the command line. If I step out and start using something like the native FLAC tools (which seems safest to me for metadata setting), should I stick with UTF-8 for my scripts? Windows 1252? Or....?

One thing I've noticed is that a lot of the open source tools that I use lean a bit towards Linux and tend to use UTF-8, but I'm still unclear about how the Win 10 command line handles that.

The other thing I get with metadata inside XY is a lot of <?> symbols, which is what's usual for native XY and, if I recall correctly, WIndows canonical properties, which is even worse than running it through DOS 437 as the original letters aren't "fixable".

At this point, I'm kind of scared to even try to set metadata in my flac files via XY scripting, even using flac tools. Classical music can be a royal pain to tag properly. And my personal tastes run heavily into 20th Century French music, so there are a lot of diacritics, and even though it would be easier to just lose them, Faure just looks wrong to me - and I certainly don't want to get stuck with Faur or Faur<?> or Fauré or something even worse.

Anyone still with me and want to give an American some advice on how Windows handles, especially UTF_8 via the command line, especially with Linux-oriented open source tools compiled for Windows? And should I be using a different character encoding for my script files? Highend once warned me off changing metadata with ffmpeg - but without understanding how open source tools, XY, and the Win 10 Command Line interact, I'm even hesitant to use flac tools, which should be reliable.

So...advice on how to deal with all this properly?

highend
Posts: 13309
Joined: 06 Feb 2011 00:33

Re: XY, character sets, and cmd.exe

Post by highend »

Don't have any problems with UTF-8 from a command line utility (that supports it)

text runret("""D:\Tools\MediaInfo\MediaInfo.exe"" ""R:\test.mp3""", %TEMP%, 65001);

Code: Select all

...
Album                                    : Eugène - album
Track name                               : Eugène - title
Performer                                : Eugène - artist
...
As long as the console uses a font that is able to display all unicode characters. Lucida console should work fine.
I store my XY scripts as UTF-8 (without BOM but that's just personal preference).
XY scripts with Windows 1252 wouldn't make sense to me, any used UTF-8 char would get scrambled...
One of my scripts helped you out? Please donate via Paypal

Post Reply