I've tried to do text searches within PDF files, and failed with XY. I know I can use other tools to find text (and have done so). I also understand that some PDFs are not searchable, depending on how they were created.
Anyone have feedback on this? Am I missing something when I say that XY's search does not search inside PDFs?
Orson
Searching for text inside PDF files
-
admin
- Site Admin
- Posts: 64849
- Joined: 22 May 2004 16:48
- Location: Win8.1, Win10, Win11, all @100%
- Contact:
Re: Searching for text inside PDF files
XY does search inside any file, but it does not interprete the contents. It just searches the raw bytes. Since PDFs are usually compressed you won't find your strings that way.Orson wrote:I've tried to do text searches within PDF files, and failed with XY. I know I can use other tools to find text (and have done so). I also understand that some PDFs are not searchable, depending on how they were created.
Anyone have feedback on this? Am I missing something when I say that XY's search does not search inside PDFs?
Orson
FAQ | XY News RSS | XY X
Just a little experiment: I created a PDF containing the string "G.992.3." I used XY to search for "G.992" and "G.992.3" and got nothing. When I searched for "G." I found the PDF file. I could find a similar Word .DOC file with all those strings entered.
Another file manager (Xplorer2) found the PDF with these text strings entered: "G.", "G.9", "G.992" and "G.992.3" -- so there's something different going on.
If there's a way of enabling XY's search to find such a wider array of text strings within PDF files, I'd find that useful, obviously.
Orson
Another file manager (Xplorer2) found the PDF with these text strings entered: "G.", "G.9", "G.992" and "G.992.3" -- so there's something different going on.
If there's a way of enabling XY's search to find such a wider array of text strings within PDF files, I'd find that useful, obviously.
Orson
-
admin
- Site Admin
- Posts: 64849
- Joined: 22 May 2004 16:48
- Location: Win8.1, Win10, Win11, all @100%
- Contact:
There is a way for me to add this, yes. I have to tap into certain document interpreters that are shipped with the system resp. with the applications (they are called "filters" I think). Later, man...Orson wrote:Just a little experiment: I created a PDF containing the string "G.992.3." I used XY to search for "G.992" and "G.992.3" and got nothing. When I searched for "G." I found the PDF file. I could find a similar Word .DOC file with all those strings entered.
Another file manager (Xplorer2) found the PDF with these text strings entered: "G.", "G.9", "G.992" and "G.992.3" -- so there's something different going on.
If there's a way of enabling XY's search to find such a wider array of text strings within PDF files, I'd find that useful, obviously.
Orson
FAQ | XY News RSS | XY X
-
j_c_hallgren
- XY Blog Master
- Posts: 5826
- Joined: 02 Jan 2006 19:34
- Location: So. Chatham MA/Clearwater FL
- Contact:
AFAIK, almost all the text in a PDF is stored not in "normal" form, but more as one would store text that was part of a JPG, so it's in graphical format...thus XY has no present way to locate it...the only way I know of is to use some PDF reader/interface to decrypt the internal format so that it can be searched.
Addendum: Just recalled that we'd talked about this recently elsewhere!
See http://www.xyplorer.com/xyfc/viewtopic.php?t=2230 for the other prior thread.
Orson, you've been here long enough, so you didn't find that thread via a search before starting this one?
Addendum: Just recalled that we'd talked about this recently elsewhere!
See http://www.xyplorer.com/xyfc/viewtopic.php?t=2230 for the other prior thread.
Orson, you've been here long enough, so you didn't find that thread via a search before starting this one?
Still spending WAY TOO much time here! But it's such a pleasure helping XY be a treasure!
(XP on laptop with touchpad and thus NO mouse!) Using latest beta vers when possible.
(XP on laptop with touchpad and thus NO mouse!) Using latest beta vers when possible.
JC,
Sorry about that. I actually did a quick search for "PDF" and did not see the thread you mentioned before I started my post.
I don't know the technology used to embed text in a PDF file. I just know some search tools (such as apparently Xplorer2) can find text within a PDF.
I understand it's not a high priority for now. I'm just saying it would be very nice to have.
Orson
Sorry about that. I actually did a quick search for "PDF" and did not see the thread you mentioned before I started my post.
I don't know the technology used to embed text in a PDF file. I just know some search tools (such as apparently Xplorer2) can find text within a PDF.
I understand it's not a high priority for now. I'm just saying it would be very nice to have.
Orson
XYplorer Beta Club