[S] Tool that converts UTF-8 entities to Windows1252?
[S] Tool that converts UTF-8 entities to Windows1252?
Hi,
I have several exported iTunes databases (on Windows 7) where I have to backup all files from them to a different directory.
I'll write a simple script that goes line by line through the cleaned up .xml file to check if each file exists.
It's no problem to cleanup the .xml file correctly to get only the file names (with their path) but all umlauts and special
characters are encoded.
E.g.:
%C3%A4 = ä
%C3%A9 = é
%5B = [
etc.
Is there any software (regardless if pay- or freeware) that is able to convert all these entities back to it's "original" character?
It must handle all known UTF-8 entities, I don't want to do this manually!
I have several exported iTunes databases (on Windows 7) where I have to backup all files from them to a different directory.
I'll write a simple script that goes line by line through the cleaned up .xml file to check if each file exists.
It's no problem to cleanup the .xml file correctly to get only the file names (with their path) but all umlauts and special
characters are encoded.
E.g.:
%C3%A4 = ä
%C3%A9 = é
%5B = [
etc.
Is there any software (regardless if pay- or freeware) that is able to convert all these entities back to it's "original" character?
It must handle all known UTF-8 entities, I don't want to do this manually!
One of my scripts helped you out? Please donate via Paypal
-
- Posts: 1416
- Joined: 04 Nov 2008 05:35
- Location: Hanoi, Vietnam
Re: [S] Tool that converts UTF-8 entities to Windows1252?
It's URIs encode. Use this: http://www.url-encode-decode.com/
Paste your code to the left pane, Select UTF-8 then Click URL Decode
Paste your code to the left pane, Select UTF-8 then Click URL Decode
I'm a casual coder using AHK language. All of my xys scripts:
http://www.xyplorer.com/xyfc/viewtopic. ... 243#p82488
http://www.xyplorer.com/xyfc/viewtopic. ... 243#p82488
Re: [S] Tool that converts UTF-8 entities to Windows1252?
If you're running it line by line through a script anyway, why not use the SC utf8decode on every line as well?
Ralph
(OS: W11 22H2 Home x64 - XY: Current beta - Office 2019 32-bit - Display: 1920x1080 @ 125%)
(OS: W11 22H2 Home x64 - XY: Current beta - Office 2019 32-bit - Display: 1920x1080 @ 125%)
Re: [S] Tool that converts UTF-8 entities to Windows1252?
Nearly
This is URL encoding
Wiki about percent-encoding
XYplorer scripting has urlencode() and urldecode()
( also utf8encode and utf8decode too )
TEST with XYplorer scripting
Results in
- - -
Note (max length is 2083 characters) so use it better line-wise:
Result
Find me: Percent-encoding, also known as URL encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI) under certain circumstances.
1300
This is URL encoding
Wiki about percent-encoding
Code: Select all
Reserved characters after percent-encoding ! # $ & ' ( ) * + , / : ; = ? @ [ ]
%21 %23 %24 %26 %27 %28 %29 %2A %2B %2C %2F %3A %3B %3D %3F %40 %5B %5D
XYplorer scripting has urlencode() and urldecode()
( also utf8encode and utf8decode too )
Help wrote:urldecode()
Decodes URL-encoded string.
Syntax
urldecode(string, raw=0)
string String to decode (max length is 2083 characters).
TEST with XYplorer scripting
Code: Select all
$myXMLInput = "%C3 %A4 %C3 %A9 %5B %21 %23 %24 %26 %27 %28 %29 %2A %2B %2C %2F %3A %3B %3D %3F %40 %5B %5D";
$out = urldecode($myXMLInput);
text "$myXMLInput<crlf>$out";
Code: Select all
%C3 %A4 %C3 %A9 %5B %21 %23 %24 %26 %27 %28 %29 %2A %2B %2C %2F %3A %3B %3D %3F %40 %5B %5D
à ¤ à © [ ! # $ & ' ( ) * + , / : ; = ? @ [ ]
- - -
Note (max length is 2083 characters) so use it better line-wise:
Code: Select all
$myXMLInput = "%C3 %A4 %C3 %A9 %5B<crlf>%21 %23 %24 %26<crlf>%27 %28 %29 %2A %2B<crlf>%2C %2F %3A %3B %3D<crlf>%3F %40 %5B %5D";
$out="";
foreach( $LINE, $myXMLInput, "<crlf>" ){
$out = $out . urldecode($LINE) . "<crlf>";
}
text "$myXMLInput<crlf 3>$out";
Code: Select all
%C3 %A4 %C3 %A9 %5B
%21 %23 %24 %26
%27 %28 %29 %2A %2B
%2C %2F %3A %3B %3D
%3F %40 %5B %5D
à ¤ à © [
! # $ &
' ( ) * +
, / : ; =
? @ [ ]
Find me: Percent-encoding, also known as URL encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI) under certain circumstances.
1300
-
- Site Admin
- Posts: 60357
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: [S] Tool that converts UTF-8 entities to Windows1252?
Doesn't this command do it?
File | Rename Special | UrlUnescape (%20 > Space ...)
File | Rename Special | UrlUnescape (%20 > Space ...)
FAQ | XY News RSS | XY Twitter
Re: [S] Tool that converts UTF-8 entities to Windows1252?
I think we talking about a file content (parsing a XML)
But if we had to rename a fileNAME, then yes, File | Rename Special would be the way.
But if we had to rename a fileNAME, then yes, File | Rename Special would be the way.
-
- Site Admin
- Posts: 60357
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: [S] Tool that converts UTF-8 entities to Windows1252?
Ah, content, what's content?!
FAQ | XY News RSS | XY Twitter
Re: [S] Tool that converts UTF-8 entities to Windows1252?
Thanks guys.
Going through 480k lines is a bit too much for XY
I had to write a small .ahk script instead (which uses
an UriDecode function that I've found in the authotkey
forums).
Takes 2-3 seconds now, and it's all done.
Going through 480k lines is a bit too much for XY
I had to write a small .ahk script instead (which uses
an UriDecode function that I've found in the authotkey
forums).
Takes 2-3 seconds now, and it's all done.
One of my scripts helped you out? Please donate via Paypal
-
- Posts: 1416
- Joined: 04 Nov 2008 05:35
- Location: Hanoi, Vietnam
Re: [S] Tool that converts UTF-8 entities to Windows1252?
Yeah, I feel XY process string not very fast...
I'm a casual coder using AHK language. All of my xys scripts:
http://www.xyplorer.com/xyfc/viewtopic. ... 243#p82488
http://www.xyplorer.com/xyfc/viewtopic. ... 243#p82488
-
- Site Admin
- Posts: 60357
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: [S] Tool that converts UTF-8 entities to Windows1252?
Depends how you script it. But, hey, this is a file manager.
FAQ | XY News RSS | XY Twitter
Re: [S] Tool that converts UTF-8 entities to Windows1252?
I think we had this theme years ago?admin wrote:Depends how you script it. But, hey, this is a file manager.
Seeing Stefans script-example he concats strings to a variable in a loop, line for line - and highend talked about 480k lines.
As much as I remember XY concatenates strings to a variable by just linking pieces of storage - in the end the variable would be a "list" of 480k pieces of storage
You *can* help for this problem by maintaining a counter, and after having concatenated e.g. 100 elements you can assign this variable to another - this way the string/storage will be "reorganized". But I don't know if it's worth it (in this case).
W7(x64) SP1 German
( +WXP SP3 )
( +WXP SP3 )
-
- Site Admin
- Posts: 60357
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: [S] Tool that converts UTF-8 entities to Windows1252?
You can easily read the whole file into one string (takes almost no time) and do your conversions on that one string. Finally write the string back to file.
FAQ | XY News RSS | XY Twitter
Re: [S] Tool that converts UTF-8 entities to Windows1252?
>read the whole file into one stringadmin wrote:You can easily read the whole file into one string (takes almost no time) and do your conversions on that one string. Finally write the string back to file.
What's about the 2083 characters limit? That's why I suggested a line-by-line loop.
Help wrote:urldecode()
Decodes URL-encoded string.
Syntax
urldecode(string, raw=0)
string String to decode (max length is 2083 characters).
Re: [S] Tool that converts UTF-8 entities to Windows1252?
And what about:
1. reading the source file into a variable $source
2. getting all the matches via regexmatches of "%[0-9a-f]{2}"
3. sort and deduplicate such list, setting a comma as separator, and store this in $encodedchars
4. decode this into $decodedchars
5. perform a replacelist in $source using $encodedchars and $decodedchars as searchlist resp. replacelist
Would this be faster?
1. reading the source file into a variable $source
2. getting all the matches via regexmatches of "%[0-9a-f]{2}"
3. sort and deduplicate such list, setting a comma as separator, and store this in $encodedchars
4. decode this into $decodedchars
5. perform a replacelist in $source using $encodedchars and $decodedchars as searchlist resp. replacelist
Would this be faster?
Tag Backup - SimpleUpdater - XYplorer Messenger - The Unofficial XYplorer Archive - Everything in XYplorer
Don sees all [cit. from viewtopic.php?p=124094#p124094]
Don sees all [cit. from viewtopic.php?p=124094#p124094]
-
- Site Admin
- Posts: 60357
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: [S] Tool that converts UTF-8 entities to Windows1252?
OK, I should not give quick answers without looking into the help first...
FAQ | XY News RSS | XY Twitter