Overall poor unicode support, please help

Slideshow BoB · Post by **Slideshow BoB** » 10 Apr 2024 02:20

Hello world.
I'm not exactly sure how and where XYplorer is doing this, but the following commands
readfile(), writefile(), gettoken(), perhaps replace() and utf8decode()
seem to not deal properly with unicode.
download() properly saves the original source material as downloaded pages preserve characters intact.

The following YouTube link
https://www.youtube.com/watch?v=H4ZCmsyhtg4
has as a title
?? (apenas pra quem REALMENTE quer mudar de vida) 4 NEGÓCIOS DIGITAIS PRA FAZER DINHEIRO EM CASA
which I'm failing to obtain using XY commands. The "??" part is a corruption on Notepad2, which I'm using to write this text, which is expected. Depending on how I write instructions to XYplorer, even accented words like "NEGÓCIOS" become corrupt at the output.
I'd really like to be able to rely on XYplorer so proper titles can be obtained from online documents, that's why I'm asking for help.

Post by **highend** » 10 Apr 2024 02:47

which I'm failing to obtain using XY commands

Then post the script?

Post by **jupe** » 10 Apr 2024 02:48

If you are using download() to download a page then using readfile() you'll need to specify the 65001 codepage, that is probably where you are going wrong, which is making the other cmds look like they aren't working, you could alternatively use readurlutf8() instead.

Slideshow BoB · Post by **Slideshow BoB** » 15 Apr 2024 13:06

highend wrote: ↑10 Apr 2024 02:47
which I'm failing to obtain using XY commands
Then post the script?

Sure. Thank you for taking some time into this.

$target = "<xyscripts>\Logs\";
// step;
$log = "$target" . "youTubeLogger.txt";

// $source = "<clipboard>";
// $source = "https://www.youtube.com/watch?v=H4ZCmsyhtg4"; //wont work
$source = "https://www.youtube.com/watch?v=73Kt4mbm-_U"; //works
// $source = "https://m.youtube.com/watch?v=H4ZCmsyhtg4";
// $source = "https://youtu.be/watch?v=H4ZCmsyhtg4";

$source = gettoken ("$source", 1, "&", , 1);
$source = "https://www.youtube.com/watch?v=" . gettoken ("$source", -1, "=", , 2);
copytext "$source";
writefile ("$log", "$source - ", a, tu);
echo "Copied<crlf><crlf>" . "$source" . "<crlf><crlf>into clipboard.<crlf><crlf>Click OK when ready to continue...";
// download "$source", "<curpath>\temp.htm", o;
download "$source", "<xydata>\Temp\temp.htm", o;

// $tmp = readfile ("<xydata>\Temp\temp.htm", t, , 65001);
// $tmp = readfile ("<xydata>\Temp\temp.htm", t, , utf16);
$tmp = readfile ("<xydata>\Temp\temp.htm", , , utf8); // works BEST

// $tmp = readfile ("<curpath>\temp.htm", t, , utf8);
// step;
$tmp = gettoken ("$tmp", 1, "</title><meta name=");
$tmp = gettoken ("$tmp", 2, "><title>");
copytext replace(utf8decode("$tmp"), " - YouTube", "");
text "<clipboard>";
// writefile ("$log", "<clipboard><crlf>", a, utf8);
// writefile ("$log", "<clipboard><crlf>", a, utf16);
writefile ("$log", "<clipboard><crlf>", a, 65001);
beep; beep; beep;
status "Clipboard: <clipboard>";
delete 0, 0, "<curpath>\temp.htm"; //sometimes not working - XY is still using it

Slideshow BoB · Post by **Slideshow BoB** » 15 Apr 2024 13:09

jupe wrote: ↑10 Apr 2024 02:48 If you are using download() to download a page then using readfile() you'll need to specify the 65001 codepage, that is probably where you are going wrong, which is making the other cmds look like they aren't working, you could alternatively use readurlutf8() instead.

Hello, thanks for the inputs which I actually worked with.
I'm just not sure how I could use the readurlutf8() part tho.
Please take a look at the code I posted in my previous post, TIA

Post by **highend** » 15 Apr 2024 13:33

utf8 for the codepage? oO

This is the correct command:
$tmp = readfile ("<xydata>\Temp\temp.htm", "ru", , 65001);

And then you get:
🤫 (apenas pra quem REALMENTE quer mudar de vida) 4 NEGÓCIOS DIGITAIS PRA FAZER DINHEIRO EM CASA

Slideshow BoB · Post by **Slideshow BoB** » 15 Apr 2024 13:44

Thank you very much. It looks it's enough for the whole code to function properly.

Post by **jupe** » 15 Apr 2024 20:21

Slideshow BoB wrote: ↑15 Apr 2024 13:09 I'm just not sure how I could use the readurlutf8() part tho.

You just want the page title right?, so I meant something like this, saves you from read/write/deleting a file.

Code: Select all

text gettoken(readurlutf8("https://youtu.be/watch?v=H4ZCmsyhtg4",, 1), 1, " - YouTube");

I thought reading the help for the cmd would enable you. BTW in your posted script, you seem to be deleting the temp file from possibly a different path than you wrote it to, so it's probably where your issue is.

Slideshow BoB · Post by **Slideshow BoB** » 16 Apr 2024 11:31

Hello there jupe.
Thanks a lot for the enlightenment, it's cool to have such a fast option to rely on.

Turns out I'm considering not deleting the original content, exporting it to a different location under its title for forther scrutiny. I'm in the middle of something here so I have faster access to my YouTube history since I recently discovered Chrome gets heavy when scrolling down to month 3 onwards. I'm open to ideas, those of you who may know of a lighter option I could make use of for such mundane task.

Too bad we're entering days with no rights to memory.

XYplorer Beta Club

Overall poor unicode support, please help

Overall poor unicode support, please help

Re: Overall poor unicode support, please help

Re: Overall poor unicode support, please help

Re: Overall poor unicode support, please help

Re: Overall poor unicode support, please help

Re: Overall poor unicode support, please help

Re: Overall poor unicode support, please help

Re: Overall poor unicode support, please help

Re: Overall poor unicode support, please help