Searching for text inside PDF files

Please check the FAQ (https://www.xyplorer.com/faq.php) before posting a question...
Post Reply
Orson
Posts: 85
Joined: 24 Oct 2006 20:31

Searching for text inside PDF files

Post by Orson »

I've tried to do text searches within PDF files, and failed with XY. I know I can use other tools to find text (and have done so). I also understand that some PDFs are not searchable, depending on how they were created.

Anyone have feedback on this? Am I missing something when I say that XY's search does not search inside PDFs?

Orson

admin
Site Admin
Posts: 64854
Joined: 22 May 2004 16:48
Location: Win8.1, Win10, Win11, all @100%
Contact:

Re: Searching for text inside PDF files

Post by admin »

Orson wrote:I've tried to do text searches within PDF files, and failed with XY. I know I can use other tools to find text (and have done so). I also understand that some PDFs are not searchable, depending on how they were created.

Anyone have feedback on this? Am I missing something when I say that XY's search does not search inside PDFs?

Orson
XY does search inside any file, but it does not interprete the contents. It just searches the raw bytes. Since PDFs are usually compressed you won't find your strings that way.

Orson
Posts: 85
Joined: 24 Oct 2006 20:31

Post by Orson »

Just a little experiment: I created a PDF containing the string "G.992.3." I used XY to search for "G.992" and "G.992.3" and got nothing. When I searched for "G." I found the PDF file. I could find a similar Word .DOC file with all those strings entered.

Another file manager (Xplorer2) found the PDF with these text strings entered: "G.", "G.9", "G.992" and "G.992.3" -- so there's something different going on.

If there's a way of enabling XY's search to find such a wider array of text strings within PDF files, I'd find that useful, obviously.

Orson

admin
Site Admin
Posts: 64854
Joined: 22 May 2004 16:48
Location: Win8.1, Win10, Win11, all @100%
Contact:

Post by admin »

Orson wrote:Just a little experiment: I created a PDF containing the string "G.992.3." I used XY to search for "G.992" and "G.992.3" and got nothing. When I searched for "G." I found the PDF file. I could find a similar Word .DOC file with all those strings entered.

Another file manager (Xplorer2) found the PDF with these text strings entered: "G.", "G.9", "G.992" and "G.992.3" -- so there's something different going on.

If there's a way of enabling XY's search to find such a wider array of text strings within PDF files, I'd find that useful, obviously.

Orson
There is a way for me to add this, yes. I have to tap into certain document interpreters that are shipped with the system resp. with the applications (they are called "filters" I think). Later, man...

j_c_hallgren
XY Blog Master
Posts: 5826
Joined: 02 Jan 2006 19:34
Location: So. Chatham MA/Clearwater FL
Contact:

Post by j_c_hallgren »

AFAIK, almost all the text in a PDF is stored not in "normal" form, but more as one would store text that was part of a JPG, so it's in graphical format...thus XY has no present way to locate it...the only way I know of is to use some PDF reader/interface to decrypt the internal format so that it can be searched.

Addendum: Just recalled that we'd talked about this recently elsewhere!
See http://www.xyplorer.com/xyfc/viewtopic.php?t=2230 for the other prior thread.

Orson, you've been here long enough, so you didn't find that thread via a search before starting this one? :wink:
Still spending WAY TOO much time here! But it's such a pleasure helping XY be a treasure!
(XP on laptop with touchpad and thus NO mouse!) Using latest beta vers when possible.

Orson
Posts: 85
Joined: 24 Oct 2006 20:31

Post by Orson »

JC,

Sorry about that. I actually did a quick search for "PDF" and did not see the thread you mentioned before I started my post.

I don't know the technology used to embed text in a PDF file. I just know some search tools (such as apparently Xplorer2) can find text within a PDF.

I understand it's not a high priority for now. I'm just saying it would be very nice to have.

Orson

Post Reply