dealing with unicode

Things you’d like to miss in the future...
Post Reply
MBaas
Posts: 582
Joined: 15 Feb 2016 21:08

dealing with unicode

Post by MBaas »

I have a small text file that has no BOM. When I look at it with F11 (or in the F12 preview) all is well.
But trying to ::copytext readfile(<curitem>) looses the UTF-8 somewhere. Also ::msg readfile(<curitem>) shows the data incorrectly.

What's even more confusing - also with mode "ru" it displays the data wrong - but I did check ::msg "⍳⍴⍵" and that worked - so clearly msg can deal with unicode.
But something seems to be wrong with readfile with and without auto-detection...

---wrong---
assertion failed: dt TMatch 22 ns.⎕ATX fn ⍝ at ⎕SE.LinkTest.test_attributes[16]
test_attributes[16] assert'dt TMatch 22 ns.⎕ATX fn'
∧
---correct---
assertion failed: dt TMatch 22 ns.⎕ATX fn ⍝ at ⎕SE.LinkTest.test_attributes[16]
test_attributes[16] assert'dt TMatch 22 ns.⎕ATX fn'

-----
Attachments
test1-20240302.zip
(277 Bytes) Downloaded 6 times
______________________________________________
Happy user ;-)

admin
Site Admin
Posts: 60619
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: dealing with unicode

Post by admin »

ERR is probably not among your Text Files extensions (Configuration | Preview | Previewed Formats), therefore it does not get the UTF-8 treatment.

Btw, the setting of Configuration | Preview | Preview | Text preview | UTF-8 auto-detection does not affect SC readfile. SC readfile always checks for BOM-less UTF-8 with text files, and never for non-text files. ---> WRONG

MBaas
Posts: 582
Joined: 15 Feb 2016 21:08

Re: dealing with unicode

Post by MBaas »

admin wrote: 02 Mar 2024 20:41 Btw, the setting of Configuration | Preview | Preview | Text preview | UTF-8 auto-detection does not affect SC readfile.
I guess you wanted to say "does affect" ;)

Actually...the world has changed for me - even random extensions can have unicode content. But, OTOH the list of binary files is random as well...
______________________________________________
Happy user ;-)

admin
Site Admin
Posts: 60619
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: dealing with unicode

Post by admin »

MBaas wrote: 03 Mar 2024 08:13
admin wrote: 02 Mar 2024 20:41 Btw, the setting of Configuration | Preview | Preview | Text preview | UTF-8 auto-detection does not affect SC readfile.
I guess you wanted to say "does affect" ;)

Actually...the world has changed for me - even random extensions can have unicode content. But, OTOH the list of binary files is random as well...
1. Damn it. :ninja: But I think SC readfile should have a way to become independent of that GUI setting; maybe by adding another mode flag. Any opinions?

2. "the list of binary files is random as well" ... not sure what you mean

3. I'm wondering if explicitly passing mode "t" with SC readfile shouldn't treat the file as a text file (without the need to add the extension to "Text Files"). Any opinions?

MBaas
Posts: 582
Joined: 15 Feb 2016 21:08

Re: dealing with unicode

Post by MBaas »

1+3. I totally agree and I like the "t" idea!

2.sry, that was useless. I had the idea readfile could always work in "sensible" mode - except when dealing with binary files. But they are as hard to enumerate. I think 1+3 will solve this for good.
______________________________________________
Happy user ;-)

MBaas
Posts: 582
Joined: 15 Feb 2016 21:08

Re: dealing with unicode

Post by MBaas »

:appl: THANK YOU - so quick and works like a charm! :cup: :cup: :cup:
______________________________________________
Happy user ;-)

Post Reply