Page 1 of 1

*##### scheme for unicode characters in edit boxes

Posted: 18 Apr 2008 21:42
by admin
With v7.00.0021 - 2008-04-09 15:04 - I introduced a makeshift way to deal with UNICODE characters in XY edit boxes (which do not support unicode natively and can only show "?" chars instead -- at least in Germany!).

Code: Select all

  +++ Renaming UNICODE file names via edit box (using F2): I finally 
      developed a workable way to rename files containing unicode 
      characters in the name. Sure it's not ideal, but far better than 
      nothing.
      It works quite simple: Before being displayed in the edit box I 
      convert all unicode chars to their numerical equivalents prefixed 
      with * (asterisk). You will see something like this:
        *48149*46041*44397abc.txt
      You can now normally edit the name (even the numbers if you know 
      what you are doing) and apply the new name. The escape sequences 
      will be converted back to unicode and displayed in the file list 
      (which does support unicode, of course).
      Works everywhere: Tree, List, Catalog, and all other sorts of 
      listswith a rename functionality.
      Note: This revolutionary *##### scheme will provide easy UNICODE 
      support for scripting (converter commands will be added soon...), 
      and it will allow you to store UNICODE in normal ASCII text files!
  +++ Rename Special: Works with UNICODE file names now. Sure, you have 
      to adopt the *##### scheme explained here above, but once you get 
      used to it, you can do all the nice Rename Special tricks with 
      your Chinese cooking recipes!
      You also can do what's called a "transliteration" (mapping of 
      charsets), e.g. from Cyrillic to Latin, using the "Search and 
      Replace" type of rename! Here's a hint how:
        *#####*#####*#####>>abc
      Of course, this is only a 1:1 char mapping. Support for 1:many 
      mapping comes tomorrow via scripting.
I did not get much feedback on this radical approach. With one exception: in Japanese Windows this strategy dramatically fails: characters there had been correctly displayed before, and now they are replaced by numbers. :(
So, at least for Japan, and probably also Korea and other DBCS systems (if that turns out to be the crucial criterion), I have to disable the *##### scheme.

But can I get some feedback from other users in other parts of the world, please? How do you like this feature? Should I make this scheme optional in config?

Posted: 18 Apr 2008 22:50
by mwb1100
I very rarely deal with non-ASCII characters, so right up front I'll admit that any suggestion I make regarding UNICODE should be given nearly zero weight.

But if I ever find myself having to deal with this interface, I know already that all the UNICODE tables I've ever come across list the codepoints in hex, so it would be nice if the hex codes are acceptable (if they aren't already).

So

Code: Select all

*0xBC15*0xB3D9*0xAD6Dabc.txt 
or maybe

Code: Select all

*U+BC15*U+B3D9*U+AD6Dabc.txt
("U+xxxx" is how the UNICODE standard does code points).

would be equivalent to

Code: Select all

*48149*46041*44397abc.txt
Also, in your example, how does your code know when the numbering for the last UNICODE character stops? Maybe the user wanted UNICODE character *4439 followed by "7abc.txt" in the filename?

Posted: 19 Apr 2008 08:20
by admin
mwb1100 wrote:I very rarely deal with non-ASCII characters, so right up front I'll admit that any suggestion I make regarding UNICODE should be given nearly zero weight.

But if I ever find myself having to deal with this interface, I know already that all the UNICODE tables I've ever come across list the codepoints in hex, so it would be nice if the hex codes are acceptable (if they aren't already).

So

Code: Select all

*0xBC15*0xB3D9*0xAD6Dabc.txt 
or maybe

Code: Select all

*U+BC15*U+B3D9*U+AD6Dabc.txt
("U+xxxx" is how the UNICODE standard does code points).

would be equivalent to

Code: Select all

*48149*46041*44397abc.txt
Also, in your example, how does your code know when the numbering for the last UNICODE character stops? Maybe the user wanted UNICODE character *4439 followed by "7abc.txt" in the filename?
Your last question: *4439 would be *04439 in my scheme (leading zeros to fill 5 digits), so that's no issue.

I'm aware of U+FFFF and I would prefer to use it. The disadvantage however is, that these are al legal chars in filenames, so the likeliness of an unwanted conversion is higher than in my scheme which uses "*". A compromise would be U*FFFF or, probably the best, your idea *U+FFFF.

Any other opinions or objections against *U+FFFF?