Writing unicode files

Things you’d like to miss in the future...
Post Reply
MBaas
Posts: 598
Joined: 15 Feb 2016 21:08

Writing unicode files

Post by MBaas »

I just noticed a problem when I tried to process that I exported from XY. For a repro, try this:

Code: Select all

    writefile("foo1.txt","""Mein Behälter"" C:\Data\Mein Behälter\",,"utf8bom");
Open foo1.txt with a binary editor and examine the two "ä"s: the first one is written as "C3 A4" - the second one as "61 CC 88".

I suppose this is "somewhat legal" - but it's not entirely correct:
VS Code (which is very tolerant) has no problems with it, UltraEdit can also handle it - but its hex mode shows there's some weirdness.
05-05-2024_15-02-01.png
05-05-2024_15-02-01.png (5.48 KiB) Viewed 420 times
And the environment I want to read it from simple injects blanks - which messes up subsequent processing, ofc.

Unfortunately I don't know this stuff well enough to provide a better explanation - but I hope it is reproduceable and fixable anyway ;)
______________________________________________
Happy user ;-)

Horst
Posts: 1112
Joined: 24 Jan 2021 12:27
Location: Germany

Re: Writing unicode files

Post by Horst »

You write UTF 8 with BOM.
So get the BOM characters, or what else do you expect ?
Windows 11 Home x64 Version 23H2 (OS Build 22631.3672)
Portable XYplorer (actual version, including betas)
Everything 1.5.0.1376a (x64), Everything Toolbar 1.3.3, Listary Pro 6.3.0.78

admin
Site Admin
Posts: 61011
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: Writing unicode files

Post by admin »

Looks like a bug, but I ran your script and the resulting file looks as it should:
2024-05-05_164411.png
2024-05-05_164411.png (4.2 KiB) Viewed 402 times
I'm using the English locale though, which might (but shouldn't!) make a difference:

Code: Select all

System / Thread Locale ID: 1033 (en-US) / 1033 (en-US)
Default ANSI Code Page: 1252  (ANSI - Latin I)
Active ANSI Code Page: 1252  (ANSI - Latin I)
Default OEM code page: 437   (OEM - United States)
Active OEM Code Page: 437   (OEM - United States)

MBaas
Posts: 598
Joined: 15 Feb 2016 21:08

Re: Writing unicode files

Post by MBaas »

Just to confirm...I changed my system settings - but it did not affect the result of writefile!

My settings now match yours: (at least we know now that THIS did not cause the difference - as you said!)

Code: Select all

System / Thread Locale ID: 1033 (en-US) / 1033 (en-US)
Default ANSI Code Page: 1252  (ANSI - Latin I)
Active ANSI Code Page: 1252  (ANSI - Latin I)
Default OEM code page: 437   (OEM - United States)
Active OEM Code Page: 437   (OEM - United States)
______________________________________________
Happy user ;-)

admin
Site Admin
Posts: 61011
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: Writing unicode files

Post by admin »

Try this:

Code: Select all

text utf8encode("""Mein Behälter"" C:\Data\Mein Behälter\", 1, 1);

MBaas
Posts: 598
Joined: 15 Feb 2016 21:08

Re: Writing unicode files

Post by MBaas »

Code: Select all

"Mein Behälter" C:\Data\Mein Behälter\
______________________________________________
Happy user ;-)

admin
Site Admin
Posts: 61011
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: Writing unicode files

Post by admin »

And this:

Code: Select all

text hexdump(utf8encode("""Mein Behälter"" C:\Data\Mein Behälter\", 0, 1), 1);

MBaas
Posts: 598
Joined: 15 Feb 2016 21:08

Re: Writing unicode files

Post by MBaas »

Looks good as werll:

Code: Select all

00000000: 22 4D 65 69 6E 20 42 65 68 C3 A4 6C 74 65 72 22   "Mein Behälter"
00000010: 20 43 3A 5C 44 61 74 61 5C 4D 65 69 6E 20 42 65    C:\Data\Mein Be
00000020: 68 C3 A4 6C 74 65 72 5C                           hälter\        
______________________________________________
Happy user ;-)

admin
Site Admin
Posts: 61011
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: Writing unicode files

Post by admin »

Hmm. Are you sure that UltraEdit is showing the truth?

MBaas
Posts: 598
Joined: 15 Feb 2016 21:08

Re: Writing unicode files

Post by MBaas »

It never lied to me (so far). i#ve attached the zipped file if you wanna check yourself.
Attachments
foo1.zip
(194 Bytes) Downloaded 32 times
______________________________________________
Happy user ;-)

admin
Site Admin
Posts: 61011
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: Writing unicode files

Post by admin »

Yes, it's true. Can't explain it. Seems impossible. :veryconfused:

What happens when you put more "ä" characters in the string? Any patterns?

MBaas
Posts: 598
Joined: 15 Feb 2016 21:08

Re: Writing unicode files

Post by MBaas »

It's weird! I tried to write lots of "ä"s - they all were correct. Just my real use case (with the folder name "Gehälter") and the random repro seem to expose that behavior...
______________________________________________
Happy user ;-)

admin
Site Admin
Posts: 61011
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: Writing unicode files

Post by admin »

It might be a different "ä" pasted from somewhere. Try to delete this character and type it again.

MBaas
Posts: 598
Joined: 15 Feb 2016 21:08

Re: Writing unicode files

Post by MBaas »

Indeed - the folder name had "ä" in it (61 CC C9) which was faithfully reproduced when the file was created. It's many years old - I have no idea how it got in there. Possibly I used an inferior file mgr and for sure a different O/S. What a weird one - thanks for bearing with me and apologies for being so blind! :oops:
______________________________________________
Happy user ;-)

Post Reply