Hello world.
I'm not exactly sure how and where XYplorer is doing this, but the following commands
readfile(), writefile(), gettoken(), perhaps replace() and utf8decode()
seem to not deal properly with unicode.
download() properly saves the original source material as downloaded pages preserve characters intact.
The following YouTube link
https://www.youtube.com/watch?v=H4ZCmsyhtg4
has as a title
?? (apenas pra quem REALMENTE quer mudar de vida) 4 NEGÓCIOS DIGITAIS PRA FAZER DINHEIRO EM CASA
which I'm failing to obtain using XY commands. The "??" part is a corruption on Notepad2, which I'm using to write this text, which is expected. Depending on how I write instructions to XYplorer, even accented words like "NEGÓCIOS" become corrupt at the output.
I'd really like to be able to rely on XYplorer so proper titles can be obtained from online documents, that's why I'm asking for help.
Overall poor unicode support, please help
Re: Overall poor unicode support, please help
Then post the script?which I'm failing to obtain using XY commands
One of my scripts helped you out? Please donate via Paypal
Re: Overall poor unicode support, please help
If you are using download() to download a page then using readfile() you'll need to specify the 65001 codepage, that is probably where you are going wrong, which is making the other cmds look like they aren't working, you could alternatively use readurlutf8() instead.
-
- Posts: 6
- Joined: 03 Apr 2024 18:46
Re: Overall poor unicode support, please help
Sure. Thank you for taking some time into this.
$target = "<xyscripts>\Logs\";
// step;
$log = "$target" . "youTubeLogger.txt";
// $source = "<clipboard>";
// $source = "https://www.youtube.com/watch?v=H4ZCmsyhtg4"; //wont work
$source = "https://www.youtube.com/watch?v=73Kt4mbm-_U"; //works
// $source = "https://m.youtube.com/watch?v=H4ZCmsyhtg4";
// $source = "https://youtu.be/watch?v=H4ZCmsyhtg4";
$source = gettoken ("$source", 1, "&", , 1);
$source = "https://www.youtube.com/watch?v=" . gettoken ("$source", -1, "=", , 2);
copytext "$source";
writefile ("$log", "$source - ", a, tu);
echo "Copied<crlf><crlf>" . "$source" . "<crlf><crlf>into clipboard.<crlf><crlf>Click OK when ready to continue...";
// download "$source", "<curpath>\temp.htm", o;
download "$source", "<xydata>\Temp\temp.htm", o;
// $tmp = readfile ("<xydata>\Temp\temp.htm", t, , 65001);
// $tmp = readfile ("<xydata>\Temp\temp.htm", t, , utf16);
$tmp = readfile ("<xydata>\Temp\temp.htm", , , utf8); // works BEST
// $tmp = readfile ("<curpath>\temp.htm", t, , utf8);
// step;
$tmp = gettoken ("$tmp", 1, "</title><meta name=");
$tmp = gettoken ("$tmp", 2, "><title>");
copytext replace(utf8decode("$tmp"), " - YouTube", "");
text "<clipboard>";
// writefile ("$log", "<clipboard><crlf>", a, utf8);
// writefile ("$log", "<clipboard><crlf>", a, utf16);
writefile ("$log", "<clipboard><crlf>", a, 65001);
beep; beep; beep;
status "Clipboard: <clipboard>";
delete 0, 0, "<curpath>\temp.htm"; //sometimes not working - XY is still using it
-
- Posts: 6
- Joined: 03 Apr 2024 18:46
Re: Overall poor unicode support, please help
Hello, thanks for the inputs which I actually worked with.
I'm just not sure how I could use the readurlutf8() part tho.
Please take a look at the code I posted in my previous post, TIA
Re: Overall poor unicode support, please help
utf8 for the codepage? oO
This is the correct command:
And then you get:
🤫 (apenas pra quem REALMENTE quer mudar de vida) 4 NEGÓCIOS DIGITAIS PRA FAZER DINHEIRO EM CASA
This is the correct command:
$tmp = readfile ("<xydata>\Temp\temp.htm", "ru", , 65001);
And then you get:
🤫 (apenas pra quem REALMENTE quer mudar de vida) 4 NEGÓCIOS DIGITAIS PRA FAZER DINHEIRO EM CASA
One of my scripts helped you out? Please donate via Paypal
-
- Posts: 6
- Joined: 03 Apr 2024 18:46
Re: Overall poor unicode support, please help
Thank you very much. It looks it's enough for the whole code to function properly.
Re: Overall poor unicode support, please help
You just want the page title right?, so I meant something like this, saves you from read/write/deleting a file.Slideshow BoB wrote: ↑15 Apr 2024 13:09 I'm just not sure how I could use the readurlutf8() part tho.
Code: Select all
text gettoken(readurlutf8("https://youtu.be/watch?v=H4ZCmsyhtg4",, 1), 1, " - YouTube");
-
- Posts: 6
- Joined: 03 Apr 2024 18:46
Re: Overall poor unicode support, please help
Hello there jupe.
Thanks a lot for the enlightenment, it's cool to have such a fast option to rely on.
Turns out I'm considering not deleting the original content, exporting it to a different location under its title for forther scrutiny. I'm in the middle of something here so I have faster access to my YouTube history since I recently discovered Chrome gets heavy when scrolling down to month 3 onwards. I'm open to ideas, those of you who may know of a lighter option I could make use of for such mundane task.
Too bad we're entering days with no rights to memory.
Thanks a lot for the enlightenment, it's cool to have such a fast option to rely on.
Turns out I'm considering not deleting the original content, exporting it to a different location under its title for forther scrutiny. I'm in the middle of something here so I have faster access to my YouTube history since I recently discovered Chrome gets heavy when scrolling down to month 3 onwards. I'm open to ideas, those of you who may know of a lighter option I could make use of for such mundane task.
Too bad we're entering days with no rights to memory.