[S] Tool that converts UTF-8 entities to Windows1252?

What other productivity software are you working with...
Post Reply
highend
Posts: 8304
Joined: 06 Feb 2011 00:33

[S] Tool that converts UTF-8 entities to Windows1252?

Post by highend » 25 Nov 2013 00:35

Hi,

I have several exported iTunes databases (on Windows 7) where I have to backup all files from them to a different directory.
I'll write a simple script that goes line by line through the cleaned up .xml file to check if each file exists.

It's no problem to cleanup the .xml file correctly to get only the file names (with their path) but all umlauts and special
characters are encoded.

E.g.:
%C3%A4 = ä
%C3%A9 = é
%5B = [

etc.

Is there any software (regardless if pay- or freeware) that is able to convert all these entities back to it's "original" character?
It must handle all known UTF-8 entities, I don't want to do this manually!
One of my scripts helped you out? Please donate via Paypal or highend (at) web (dot) de

binocular222
Posts: 1414
Joined: 04 Nov 2008 05:35
Location: Hanoi, Vietnam

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by binocular222 » 25 Nov 2013 02:56

It's URIs encode. Use this: http://www.url-encode-decode.com/
Paste your code to the left pane, Select UTF-8 then Click URL Decode
I'm a casual coder using AHK language. All of my xys scripts:
http://www.xyplorer.com/xyfc/viewtopic. ... 243#p82488

RalphM
Posts: 1256
Joined: 27 Jan 2005 23:38
Location: Cairns, Australia

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by RalphM » 25 Nov 2013 03:56

If you're running it line by line through a script anyway, why not use the SC utf8decode on every line as well?
Ralph :-)
(OS: W10 1809 Home x64 - XY: Current beta)

Stefan
Posts: 1360
Joined: 18 Nov 2008 21:47
Location: Germany, EU

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by Stefan » 25 Nov 2013 08:41

Nearly :P

This is URL encoding

Wiki about percent-encoding

Code: Select all

Reserved characters after percent-encoding ! 	# 	$ 	& 	' 	( 	) 	* 	+ 	, 	/ 	: 	; 	= 	? 	@ 	[ 	]
%21 	%23 	%24 	%26 	%27 	%28 	%29 	%2A 	%2B 	%2C 	%2F 	%3A 	%3B 	%3D 	%3F 	%40 	%5B 	%5D


XYplorer scripting has urlencode() and urldecode()
( also utf8encode and utf8decode too )
Help wrote:urldecode()
Decodes URL-encoded string.

Syntax
urldecode(string, raw=0)

string String to decode (max length is 2083 characters).

TEST with XYplorer scripting

Code: Select all

$myXMLInput = "%C3 %A4 %C3 %A9 %5B %21 %23 %24 %26 %27 %28 %29 %2A %2B %2C %2F %3A %3B %3D %3F %40 %5B %5D";
  $out = urldecode($myXMLInput); 
  text "$myXMLInput<crlf>$out";
Results in

Code: Select all

%C3 %A4 %C3 %A9 %5B %21 %23 %24 %26 %27 %28 %29 %2A %2B %2C %2F %3A %3B %3D %3F %40 %5B %5D
à ¤ à © [ ! # $ & ' ( ) * + , / : ; = ? @ [ ]

- - -

Note (max length is 2083 characters) so use it better line-wise:

Code: Select all

$myXMLInput = "%C3 %A4 %C3 %A9 %5B<crlf>%21 %23 %24 %26<crlf>%27 %28 %29 %2A %2B<crlf>%2C %2F %3A %3B %3D<crlf>%3F %40 %5B %5D";

  $out="";
  foreach( $LINE, $myXMLInput, "<crlf>" ){
    $out = $out . urldecode($LINE) . "<crlf>"; 
  }

  text "$myXMLInput<crlf 3>$out";
Result

Code: Select all

%C3 %A4 %C3 %A9 %5B
%21 %23 %24 %26
%27 %28 %29 %2A %2B
%2C %2F %3A %3B %3D
%3F %40 %5B %5D


à ¤ à © [
! # $ &
' ( ) * +
, / : ; =
? @ [ ]



Find me: Percent-encoding, also known as URL encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI) under certain circumstances.
1300

admin
Site Admin
Posts: 47703
Joined: 22 May 2004 16:48
Location: Cologne, Win 8.1, Win 10
Contact:

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by admin » 25 Nov 2013 10:15

Doesn't this command do it?
File | Rename Special | UrlUnescape (%20 > Space ...)

Stefan
Posts: 1360
Joined: 18 Nov 2008 21:47
Location: Germany, EU

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by Stefan » 25 Nov 2013 10:17

I think we talking about a file content (parsing a XML) :P

But if we had to rename a fileNAME, then yes, File | Rename Special would be the way.



 

admin
Site Admin
Posts: 47703
Joined: 22 May 2004 16:48
Location: Cologne, Win 8.1, Win 10
Contact:

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by admin » 25 Nov 2013 10:34

Ah, content, what's content?! :whistle: :mrgreen:

highend
Posts: 8304
Joined: 06 Feb 2011 00:33

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by highend » 25 Nov 2013 11:26

Thanks guys.

Going through 480k lines is a bit too much for XY ;)
I had to write a small .ahk script instead (which uses
an UriDecode function that I've found in the authotkey
forums).

Takes 2-3 seconds now, and it's all done.
One of my scripts helped you out? Please donate via Paypal or highend (at) web (dot) de

binocular222
Posts: 1414
Joined: 04 Nov 2008 05:35
Location: Hanoi, Vietnam

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by binocular222 » 25 Nov 2013 12:17

Yeah, I feel XY process string not very fast...
I'm a casual coder using AHK language. All of my xys scripts:
http://www.xyplorer.com/xyfc/viewtopic. ... 243#p82488

admin
Site Admin
Posts: 47703
Joined: 22 May 2004 16:48
Location: Cologne, Win 8.1, Win 10
Contact:

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by admin » 25 Nov 2013 12:25

Depends how you script it. But, hey, this is a file manager.

PeterH
Posts: 2579
Joined: 21 Nov 2005 20:39
Location: Germany

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by PeterH » 25 Nov 2013 13:08

admin wrote:Depends how you script it. But, hey, this is a file manager.
I think we had this theme years ago?

Seeing Stefans script-example he concats strings to a variable in a loop, line for line - and highend talked about 480k lines.
As much as I remember XY concatenates strings to a variable by just linking pieces of storage - in the end the variable would be a "list" of 480k pieces of storage :shock:

You *can* help for this problem by maintaining a counter, and after having concatenated e.g. 100 elements you can assign this variable to another - this way the string/storage will be "reorganized". But I don't know if it's worth it (in this case). :whistle:
W7(x64) SP1 German
( +WXP SP3 )

admin
Site Admin
Posts: 47703
Joined: 22 May 2004 16:48
Location: Cologne, Win 8.1, Win 10
Contact:

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by admin » 25 Nov 2013 13:19

You can easily read the whole file into one string (takes almost no time) and do your conversions on that one string. Finally write the string back to file.

Stefan
Posts: 1360
Joined: 18 Nov 2008 21:47
Location: Germany, EU

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by Stefan » 25 Nov 2013 13:45

admin wrote:You can easily read the whole file into one string (takes almost no time) and do your conversions on that one string. Finally write the string back to file.
>read the whole file into one string

:shock: :?:


What's about the 2083 characters limit? That's why I suggested a line-by-line loop.
Help wrote:urldecode()
Decodes URL-encoded string.

Syntax
urldecode(string, raw=0)

string String to decode (max length is 2083 characters).


 

Marco
Posts: 2266
Joined: 27 Jun 2011 15:20

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by Marco » 25 Nov 2013 13:57

And what about:

1. reading the source file into a variable $source
2. getting all the matches via regexmatches of "%[0-9a-f]{2}"
3. sort and deduplicate such list, setting a comma as separator, and store this in $encodedchars
4. decode this into $decodedchars
5. perform a replacelist in $source using $encodedchars and $decodedchars as searchlist resp. replacelist

Would this be faster?
Tag Backup - SimpleUpdater - XYplorer Messenger - The Unofficial XYplorer Archive - Everything in XYplorer
Don sees all [cit. from viewtopic.php?p=124094#p124094]

admin
Site Admin
Posts: 47703
Joined: 22 May 2004 16:48
Location: Cologne, Win 8.1, Win 10
Contact:

Re: [S] Tool that converts UTF-8 entities to Windows1252?

Post by admin » 25 Nov 2013 14:16

OK, I should not give quick answers without looking into the help first... :whistle:

Post Reply