Page 1 of 1
Searching for text inside PDF files
Posted: 27 Jun 2008 20:49
by Orson
I've tried to do text searches within PDF files, and failed with XY. I know I can use other tools to find text (and have done so). I also understand that some PDFs are not searchable, depending on how they were created.
Anyone have feedback on this? Am I missing something when I say that XY's search does not search inside PDFs?
Orson
Re: Searching for text inside PDF files
Posted: 27 Jun 2008 20:56
by admin
Orson wrote:I've tried to do text searches within PDF files, and failed with XY. I know I can use other tools to find text (and have done so). I also understand that some PDFs are not searchable, depending on how they were created.
Anyone have feedback on this? Am I missing something when I say that XY's search does not search inside PDFs?
Orson
XY does search inside
any file, but it does not interprete the contents. It just searches the raw bytes. Since PDFs are usually compressed you won't find your strings that way.
Posted: 27 Jun 2008 21:46
by Orson
Just a little experiment: I created a PDF containing the string "G.992.3." I used XY to search for "G.992" and "G.992.3" and got nothing. When I searched for "G." I found the PDF file. I could find a similar Word .DOC file with all those strings entered.
Another file manager (Xplorer2) found the PDF with these text strings entered: "G.", "G.9", "G.992" and "G.992.3" -- so there's something different going on.
If there's a way of enabling XY's search to find such a wider array of text strings within PDF files, I'd find that useful, obviously.
Orson
Posted: 27 Jun 2008 21:49
by admin
Orson wrote:Just a little experiment: I created a PDF containing the string "G.992.3." I used XY to search for "G.992" and "G.992.3" and got nothing. When I searched for "G." I found the PDF file. I could find a similar Word .DOC file with all those strings entered.
Another file manager (Xplorer2) found the PDF with these text strings entered: "G.", "G.9", "G.992" and "G.992.3" -- so there's something different going on.
If there's a way of enabling XY's search to find such a wider array of text strings within PDF files, I'd find that useful, obviously.
Orson
There is a way for me to add this, yes. I have to tap into certain document interpreters that are shipped with the system resp. with the applications (they are called "filters" I think). Later, man...
Posted: 27 Jun 2008 22:39
by j_c_hallgren
AFAIK, almost all the text in a PDF is stored not in "normal" form, but more as one would store text that was part of a JPG, so it's in graphical format...thus XY has no present way to locate it...the only way I know of is to use some PDF reader/interface to decrypt the internal format so that it can be searched.
Addendum: Just recalled that we'd talked about this recently elsewhere!
See
http://www.xyplorer.com/xyfc/viewtopic.php?t=2230 for the other prior thread.
Orson, you've been here long enough, so you didn't find that thread via a search before starting this one?

Posted: 27 Jun 2008 22:57
by Orson
JC,
Sorry about that. I actually did a quick search for "PDF" and did not see the thread you mentioned before I started my post.
I don't know the technology used to embed text in a PDF file. I just know some search tools (such as apparently Xplorer2) can find text within a PDF.
I understand it's not a high priority for now. I'm just saying it would be very nice to have.
Orson