Unicode in scripting

Things you’d like to miss in the future...
Forum rules
When reporting a bug, please include the following information: your XYplorer version (e.g., v27.90.0047), your Windows version (e.g., Win 11), and your screen scaling percentage (e.g., 125%). We recommend adding your Windows version and screen scaling percentage to your profile or signature. This will make debugging much easier for us.
Post Reply
nf_xp
Posts: 35
Joined: 10 Jul 2009 08:05

Unicode in scripting

Post by nf_xp »

2009-8-23 10-36-19.gif
2009-8-23 10-36-19.gif (15.38 KiB) Viewed 1659 times
Don, I just tried Muroph's Tag Manager v2.2, and encountered the above error - After processed line #23, the first keyword 'input' of line #24 was broken into lines #23 and #24, as well as the 'replace' in line #25, 'substr' in line #26, etc.
After replaced ▲ and ▼ in line #23 (in raw script, they're in lines #190, #191, #211 and #212), this script runs well. So I guess Unicode is the cause of this bug.

admin
Site Admin
Posts: 64883
Joined: 22 May 2004 16:48
Location: Win8.1, Win10, Win11, all @100%
Contact:

Re: Unicode in scripting

Post by admin »

nf_xp wrote:After replaced ▲ and ▼ in line #23 (in raw script, they're in lines #190, #191, #211 and #212), this script runs well. So I guess Unicode is the cause of this bug.
Yes, looks like. Currently no idea how to fix that because I don't even know exactly where/why the script was broken. It seems that the initial line parsing already chokes but why is "input" broken between "in" and "put"... :? What happens if you put any other Unicode character there -- same error?

nf_xp
Posts: 35
Joined: 10 Jul 2009 08:05

Re: Unicode in scripting

Post by nf_xp »

Test 1:

Code: Select all

    $a = "▲";
    msg "test";
2009-8-23 23-36-47.gif
2009-8-23 23-36-47.gif (9.76 KiB) Viewed 1594 times
Test 2:

Code: Select all

    $a = "▲▲";
    msg "test";
2009-8-23 23-37-06.gif
2009-8-23 23-37-06.gif (9.77 KiB) Viewed 1594 times
Raw view of the first test file in MBCS system:
2009-8-23 23-55-20.gif
2009-8-23 23-55-20.gif (4.42 KiB) Viewed 1596 times
My guess: You can see the MBCS string '$a = "▲";' takes 10 bytes in the raw view, but there are actually 9 Unicode chars after it's read into memory. Extra chars (from next line) will be read if using the byte number to break lines.

admin
Site Admin
Posts: 64883
Joined: 22 May 2004 16:48
Location: Win8.1, Win10, Win11, all @100%
Contact:

Re: Unicode in scripting

Post by admin »

Thanks, really interesting! I forgot that there are Unicode that resolve to 2 chars when converted to ANSI.

Looks like I have to rewrite a couple of heavily used functions.

nf_xp
Posts: 35
Joined: 10 Jul 2009 08:05

Re: Unicode in scripting

Post by nf_xp »

Fixed :)

Post Reply