[S] Tool that converts UTF-8 entities to Windows1252?

What other productivity software are you working with...
Post Reply
highend
Posts: 13274
Joined: 06 Feb 2011 00:33

[S] Tool that converts UTF-8 entities to Windows1252?

Post by highend »

Hi,

I have several exported iTunes databases (on Windows 7) where I have to backup all files from them to a different directory.
I'll write a simple script that goes line by line through the cleaned up .xml file to check if each file exists.

It's no problem to cleanup the .xml file correctly to get only the file names (with their path) but all umlauts and special
characters are encoded.

E.g.:
%C3%A4 = ä
%C3%A9 = é
%5B = [

etc.

Is there any software (regardless if pay- or freeware) that is able to convert all these entities back to it's "original" character?
It must handle all known UTF-8 entities, I don't want to do this manually!
One of my scripts helped you out? Please donate via Paypal

binocular222
Posts: 1416
Joined: 04 Nov 2008 05:35
Location: Hanoi, Vietnam

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by binocular222 »

It's URIs encode. Use this: http://www.url-encode-decode.com/
Paste your code to the left pane, Select UTF-8 then Click URL Decode
I'm a casual coder using AHK language. All of my xys scripts:
http://www.xyplorer.com/xyfc/viewtopic. ... 243#p82488

RalphM
Posts: 1932
Joined: 27 Jan 2005 23:38
Location: Cairns, Australia

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by RalphM »

If you're running it line by line through a script anyway, why not use the SC utf8decode on every line as well?
Ralph :)
(OS: W11 22H2 Home x64 - XY: Current beta - Office 2019 32-bit - Display: 1920x1080 @ 125%)

Stefan
Posts: 1360
Joined: 18 Nov 2008 21:47
Location: Europe

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by Stefan »

Nearly :P

This is URL encoding

Wiki about percent-encoding

Code: Select all

Reserved characters after percent-encoding ! 	# 	$ 	& 	' 	( 	) 	* 	+ 	, 	/ 	: 	; 	= 	? 	@ 	[ 	]
%21 	%23 	%24 	%26 	%27 	%28 	%29 	%2A 	%2B 	%2C 	%2F 	%3A 	%3B 	%3D 	%3F 	%40 	%5B 	%5D


XYplorer scripting has urlencode() and urldecode()
( also utf8encode and utf8decode too )
Help wrote:urldecode()
Decodes URL-encoded string.

Syntax
urldecode(string, raw=0)

string String to decode (max length is 2083 characters).

TEST with XYplorer scripting

Code: Select all

$myXMLInput = "%C3 %A4 %C3 %A9 %5B %21 %23 %24 %26 %27 %28 %29 %2A %2B %2C %2F %3A %3B %3D %3F %40 %5B %5D";
  $out = urldecode($myXMLInput); 
  text "$myXMLInput<crlf>$out";
Results in

Code: Select all

%C3 %A4 %C3 %A9 %5B %21 %23 %24 %26 %27 %28 %29 %2A %2B %2C %2F %3A %3B %3D %3F %40 %5B %5D
à ¤ à © [ ! # $ & ' ( ) * + , / : ; = ? @ [ ]

- - -

Note (max length is 2083 characters) so use it better line-wise:

Code: Select all

$myXMLInput = "%C3 %A4 %C3 %A9 %5B<crlf>%21 %23 %24 %26<crlf>%27 %28 %29 %2A %2B<crlf>%2C %2F %3A %3B %3D<crlf>%3F %40 %5B %5D";

  $out="";
  foreach( $LINE, $myXMLInput, "<crlf>" ){
    $out = $out . urldecode($LINE) . "<crlf>"; 
  }

  text "$myXMLInput<crlf 3>$out";
Result

Code: Select all

%C3 %A4 %C3 %A9 %5B
%21 %23 %24 %26
%27 %28 %29 %2A %2B
%2C %2F %3A %3B %3D
%3F %40 %5B %5D


à ¤ à © [
! # $ &
' ( ) * +
, / : ; =
? @ [ ]



Find me: Percent-encoding, also known as URL encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI) under certain circumstances.
1300

admin
Site Admin
Posts: 60357
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by admin »

Doesn't this command do it?
File | Rename Special | UrlUnescape (%20 > Space ...)

Stefan
Posts: 1360
Joined: 18 Nov 2008 21:47
Location: Europe

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by Stefan »

I think we talking about a file content (parsing a XML) :P

But if we had to rename a fileNAME, then yes, File | Rename Special would be the way.



 

admin
Site Admin
Posts: 60357
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by admin »

Ah, content, what's content?! :whistle: :mrgreen:

highend
Posts: 13274
Joined: 06 Feb 2011 00:33

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by highend »

Thanks guys.

Going through 480k lines is a bit too much for XY ;)
I had to write a small .ahk script instead (which uses
an UriDecode function that I've found in the authotkey
forums).

Takes 2-3 seconds now, and it's all done.
One of my scripts helped you out? Please donate via Paypal

binocular222
Posts: 1416
Joined: 04 Nov 2008 05:35
Location: Hanoi, Vietnam

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by binocular222 »

Yeah, I feel XY process string not very fast...
I'm a casual coder using AHK language. All of my xys scripts:
http://www.xyplorer.com/xyfc/viewtopic. ... 243#p82488

admin
Site Admin
Posts: 60357
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by admin »

Depends how you script it. But, hey, this is a file manager.

PeterH
Posts: 2776
Joined: 21 Nov 2005 20:39
Location: Germany

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by PeterH »

admin wrote:Depends how you script it. But, hey, this is a file manager.
I think we had this theme years ago?

Seeing Stefans script-example he concats strings to a variable in a loop, line for line - and highend talked about 480k lines.
As much as I remember XY concatenates strings to a variable by just linking pieces of storage - in the end the variable would be a "list" of 480k pieces of storage :shock:

You *can* help for this problem by maintaining a counter, and after having concatenated e.g. 100 elements you can assign this variable to another - this way the string/storage will be "reorganized". But I don't know if it's worth it (in this case). :whistle:
W7(x64) SP1 German
( +WXP SP3 )

admin
Site Admin
Posts: 60357
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by admin »

You can easily read the whole file into one string (takes almost no time) and do your conversions on that one string. Finally write the string back to file.

Stefan
Posts: 1360
Joined: 18 Nov 2008 21:47
Location: Europe

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by Stefan »

admin wrote:You can easily read the whole file into one string (takes almost no time) and do your conversions on that one string. Finally write the string back to file.
>read the whole file into one string

:shock: :?:


What's about the 2083 characters limit? That's why I suggested a line-by-line loop.
Help wrote:urldecode()
Decodes URL-encoded string.

Syntax
urldecode(string, raw=0)

string String to decode (max length is 2083 characters).


 

Marco
Posts: 2347
Joined: 27 Jun 2011 15:20

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by Marco »

And what about:

1. reading the source file into a variable $source
2. getting all the matches via regexmatches of "%[0-9a-f]{2}"
3. sort and deduplicate such list, setting a comma as separator, and store this in $encodedchars
4. decode this into $decodedchars
5. perform a replacelist in $source using $encodedchars and $decodedchars as searchlist resp. replacelist

Would this be faster?
Tag Backup - SimpleUpdater - XYplorer Messenger - The Unofficial XYplorer Archive - Everything in XYplorer
Don sees all [cit. from viewtopic.php?p=124094#p124094]

admin
Site Admin
Posts: 60357
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by admin »

OK, I should not give quick answers without looking into the help first... :whistle:

Post Reply