binary content search using regex seems broken
binary content search using regex seems broken
Searching for content in binary files using regex doesn't work correctly.
Searching for something like this works:
\x49
But searching for something like this does not:
\xFF
It can find only bytes in range \x00 to \x7F .
Seems something somewhere ought to be an unsigned variable instead of a signed one.
Searching for something like this works:
\x49
But searching for something like this does not:
\xFF
It can find only bytes in range \x00 to \x7F .
Seems something somewhere ought to be an unsigned variable instead of a signed one.
-
- Site Admin
- Posts: 60595
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: binary content search using regex seems broken
Works fine here.
- Attachments
-
- 2015-09-14_200053.png (5.66 KiB) Viewed 4134 times
FAQ | XY News RSS | XY Twitter
Re: binary content search using regex seems broken
Interesting. Maybe it somehow depends on options set?
Here is mine: I also tried with ::fresh, got the same result.
edit:
Can someone else perhaps confirm?
Here is mine: I also tried with ::fresh, got the same result.
edit:
Can someone else perhaps confirm?
-
- Site Admin
- Posts: 60595
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: binary content search using regex seems broken
Looks like mine.
What kind of files are you searching. Can you send me one?
What kind of files are you searching. Can you send me one?
FAQ | XY News RSS | XY Twitter
Re: binary content search using regex seems broken
I originally tried to search something in a bunch of jpeg files, but now I tried to search for \xFF in entire C:\Program Files, and so far no hit (and there should be thousands of hits!). It just can't find \xFF at all. I will experiment with this further to see if I can make it work somehow .
-
- Site Admin
- Posts: 60595
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: binary content search using regex seems broken
Rights? Run as Admin?
FAQ | XY News RSS | XY Twitter
Re: binary content search using regex seems broken
Full admin rights.
And I got two hits after all, but those were some XML files that didn't even contain \xFF byte.
And I got two hits after all, but those were some XML files that didn't even contain \xFF byte.
-
- Site Admin
- Posts: 60595
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: binary content search using regex seems broken
Try Binary instead of Text and Binary. That should remove the XML files.
FAQ | XY News RSS | XY Twitter
Re: binary content search using regex seems broken
The Bible says
The characters that ‹\x80› through ‹\xFF› match depends on how your regex engine
interprets them, and which code page your subject text is encoded in. We recommend
that you not use ‹\x80› through ‹\xFF›. Instead, use the Unicode code point token
described in Recipe 2.7.
Tag Backup - SimpleUpdater - XYplorer Messenger - The Unofficial XYplorer Archive - Everything in XYplorer
Don sees all [cit. from viewtopic.php?p=124094#p124094]
Don sees all [cit. from viewtopic.php?p=124094#p124094]
Re: binary content search using regex seems broken
Nope, they are found when either of the 3 options (Text, Binary, Text and Binary) is set.admin wrote:Try Binary instead of Text and Binary. That should remove the XML files.
I have managed to reduce one of those XML files to just 5 bytes, where if I remove any single one of them, it no longer is found using \xFF. \xFF itself is not among those 5 bytes of course. Sample attached.
It's possible, that this is related to codepage, so I will try to change regional settings, but not right now, as it requires computer restart.
update:
attached file was wrong, uploaded again
- Attachments
-
- specimen_.zip
- (167 Bytes) Downloaded 121 times
Last edited by xman on 14 Sep 2015 20:39, edited 1 time in total.
Re: binary content search using regex seems broken
The bible talks about searching text, I search binary, so encoding should be totally irrelevant. .Marco wrote:The Bible saysThe characters that ‹\x80› through ‹\xFF› match depends on how your regex engine
interprets them, and which code page your subject text is encoded in. We recommend
that you not use ‹\x80› through ‹\xFF›. Instead, use the Unicode code point token
described in Recipe 2.7.
-
- Site Admin
- Posts: 60595
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: binary content search using regex seems broken
Yes, something is wrong.
The specimen_.xml you sent is an UTF8 file that is interpreted as byte 0xFF. So it's okay when "Text and Binary" matches 0xFF.
However, "Binary" alone should not match it. Gonna fix...
The specimen_.xml you sent is an UTF8 file that is interpreted as byte 0xFF. So it's okay when "Text and Binary" matches 0xFF.
However, "Binary" alone should not match it. Gonna fix...
FAQ | XY News RSS | XY Twitter
Re: binary content search using regex seems broken
Thanks, now the XML file is not matched. But it doesn't solve the original problem (not being able to find bytes in range \x80 to \xFF).
I investigated and there is more. Download and extract the attached png file to a new folder. Now try to find \x89 in that image. 0x89 is the first byte and this value is not found anywhere else in the image. It should match, here it doesn't.
Now try to find \xA9 in that image, it should match.
Now go to regional settings and change format to "Chinese (Simplified, PRC)", it doesn't require restart. Chinese is not what I use, but it seems to work more consistently. And now try to find \xA9 again.
I investigated and there is more. Download and extract the attached png file to a new folder. Now try to find \x89 in that image. 0x89 is the first byte and this value is not found anywhere else in the image. It should match, here it doesn't.
Now try to find \xA9 in that image, it should match.
Now go to regional settings and change format to "Chinese (Simplified, PRC)", it doesn't require restart. Chinese is not what I use, but it seems to work more consistently. And now try to find \xA9 again.
- Attachments
-
- TitleButtonIcon.zip
- (319 Bytes) Downloaded 139 times
-
- Site Admin
- Posts: 60595
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: binary content search using regex seems broken
Confirmed. ATM I cannot solve this riddle.
I suggest you use XY's built-in hex content search meanwhile:
I suggest you use XY's built-in hex content search meanwhile:
- Attachments
-
- 2015-09-15_202133.png (5.64 KiB) Viewed 4052 times
FAQ | XY News RSS | XY Twitter
Re: binary content search using regex seems broken
Thanks. Hex-content search was what I used before, but it doesn't really work, when one wants to find all files that start with some pattern (for instance).
For the record, this bug seems to "work" not just for Chinese, but also for Arabic, Russian, Serbian, Czech, Greek and probably many other languages, but sometimes it worked for these languages, don't know why. I couldn't get the bug working at all (aside from the first of the two problems) with English, German, Spanish, French.
It seems to me that the regex engine tries to interpret files based on system settings and doesn't really treat them as just a bunch of bytes.
For the record, this bug seems to "work" not just for Chinese, but also for Arabic, Russian, Serbian, Czech, Greek and probably many other languages, but sometimes it worked for these languages, don't know why. I couldn't get the bug working at all (aside from the first of the two problems) with English, German, Spanish, French.
It seems to me that the regex engine tries to interpret files based on system settings and doesn't really treat them as just a bunch of bytes.