Overall poor unicode support, please help

Please check the FAQ (https://www.xyplorer.com/faq.php) before posting a question...
Post Reply
Slideshow BoB
Posts: 6
Joined: 03 Apr 2024 18:46

Overall poor unicode support, please help

Post by Slideshow BoB »

Hello world.
I'm not exactly sure how and where XYplorer is doing this, but the following commands
readfile(), writefile(), gettoken(), perhaps replace() and utf8decode()
seem to not deal properly with unicode.
download() properly saves the original source material as downloaded pages preserve characters intact.

The following YouTube link
https://www.youtube.com/watch?v=H4ZCmsyhtg4
has as a title
?? (apenas pra quem REALMENTE quer mudar de vida) 4 NEGÓCIOS DIGITAIS PRA FAZER DINHEIRO EM CASA
which I'm failing to obtain using XY commands. The "??" part is a corruption on Notepad2, which I'm using to write this text, which is expected. Depending on how I write instructions to XYplorer, even accented words like "NEGÓCIOS" become corrupt at the output.
I'd really like to be able to rely on XYplorer so proper titles can be obtained from online documents, that's why I'm asking for help.

highend
Posts: 13338
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: Overall poor unicode support, please help

Post by highend »

which I'm failing to obtain using XY commands
Then post the script?
One of my scripts helped you out? Please donate via Paypal

jupe
Posts: 2809
Joined: 20 Oct 2017 21:14
Location: Win10 22H2 120dpi

Re: Overall poor unicode support, please help

Post by jupe »

If you are using download() to download a page then using readfile() you'll need to specify the 65001 codepage, that is probably where you are going wrong, which is making the other cmds look like they aren't working, you could alternatively use readurlutf8() instead.

Slideshow BoB
Posts: 6
Joined: 03 Apr 2024 18:46

Re: Overall poor unicode support, please help

Post by Slideshow BoB »

highend wrote: 10 Apr 2024 02:47
which I'm failing to obtain using XY commands
Then post the script?
Sure. Thank you for taking some time into this.


$target = "<xyscripts>\Logs\";
// step;
$log = "$target" . "youTubeLogger.txt";

// $source = "<clipboard>";
// $source = "https://www.youtube.com/watch?v=H4ZCmsyhtg4"; //wont work
$source = "https://www.youtube.com/watch?v=73Kt4mbm-_U"; //works
// $source = "https://m.youtube.com/watch?v=H4ZCmsyhtg4";
// $source = "https://youtu.be/watch?v=H4ZCmsyhtg4";

$source = gettoken ("$source", 1, "&", , 1);
$source = "https://www.youtube.com/watch?v=" . gettoken ("$source", -1, "=", , 2);
copytext "$source";
writefile ("$log", "$source - ", a, tu);
echo "Copied<crlf><crlf>" . "$source" . "<crlf><crlf>into clipboard.<crlf><crlf>Click OK when ready to continue...";
// download "$source", "<curpath>\temp.htm", o;
download "$source", "<xydata>\Temp\temp.htm", o;

// $tmp = readfile ("<xydata>\Temp\temp.htm", t, , 65001);
// $tmp = readfile ("<xydata>\Temp\temp.htm", t, , utf16);
$tmp = readfile ("<xydata>\Temp\temp.htm", , , utf8); // works BEST

// $tmp = readfile ("<curpath>\temp.htm", t, , utf8);
// step;
$tmp = gettoken ("$tmp", 1, "</title><meta name=");
$tmp = gettoken ("$tmp", 2, "><title>");
copytext replace(utf8decode("$tmp"), " - YouTube", "");
text "<clipboard>";
// writefile ("$log", "<clipboard><crlf>", a, utf8);
// writefile ("$log", "<clipboard><crlf>", a, utf16);
writefile ("$log", "<clipboard><crlf>", a, 65001);
beep; beep; beep;
status "Clipboard: <clipboard>";
delete 0, 0, "<curpath>\temp.htm"; //sometimes not working - XY is still using it

Slideshow BoB
Posts: 6
Joined: 03 Apr 2024 18:46

Re: Overall poor unicode support, please help

Post by Slideshow BoB »

jupe wrote: 10 Apr 2024 02:48 If you are using download() to download a page then using readfile() you'll need to specify the 65001 codepage, that is probably where you are going wrong, which is making the other cmds look like they aren't working, you could alternatively use readurlutf8() instead.
Hello, thanks for the inputs which I actually worked with.
I'm just not sure how I could use the readurlutf8() part tho.
Please take a look at the code I posted in my previous post, TIA

highend
Posts: 13338
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: Overall poor unicode support, please help

Post by highend »

utf8 for the codepage? oO

This is the correct command:
$tmp = readfile ("<xydata>\Temp\temp.htm", "ru", , 65001);

And then you get:
🤫 (apenas pra quem REALMENTE quer mudar de vida) 4 NEGÓCIOS DIGITAIS PRA FAZER DINHEIRO EM CASA
One of my scripts helped you out? Please donate via Paypal

Slideshow BoB
Posts: 6
Joined: 03 Apr 2024 18:46

Re: Overall poor unicode support, please help

Post by Slideshow BoB »

Thank you very much. It looks it's enough for the whole code to function properly.

jupe
Posts: 2809
Joined: 20 Oct 2017 21:14
Location: Win10 22H2 120dpi

Re: Overall poor unicode support, please help

Post by jupe »

Slideshow BoB wrote: 15 Apr 2024 13:09 I'm just not sure how I could use the readurlutf8() part tho.
You just want the page title right?, so I meant something like this, saves you from read/write/deleting a file.

Code: Select all

text gettoken(readurlutf8("https://youtu.be/watch?v=H4ZCmsyhtg4",, 1), 1, " - YouTube");
I thought reading the help for the cmd would enable you. BTW in your posted script, you seem to be deleting the temp file from possibly a different path than you wrote it to, so it's probably where your issue is.

Slideshow BoB
Posts: 6
Joined: 03 Apr 2024 18:46

Re: Overall poor unicode support, please help

Post by Slideshow BoB »

Hello there jupe.
Thanks a lot for the enlightenment, it's cool to have such a fast option to rely on.

Turns out I'm considering not deleting the original content, exporting it to a different location under its title for forther scrutiny. I'm in the middle of something here so I have faster access to my YouTube history since I recently discovered Chrome gets heavy when scrolling down to month 3 onwards. I'm open to ideas, those of you who may know of a lighter option I could make use of for such mundane task.

Too bad we're entering days with no rights to memory.

Post Reply