mark searchable PDF files

Please check the FAQ (https://www.xyplorer.com/faq.php) before posting a question...
mgroen
Posts: 8
Joined: 19 May 2015 17:08

mark searchable PDF files

Post by mgroen »

Hi,

Is XYplorer able to set a marker so that the user is able to distinguish searchable / non searchable PDF files from each other?
For example, by creating a column "searchable" and put an S (or any marker) on each line for searchable PDF files?
(and not set a marker for non searchables)

Explanation:
I have lots of PDF files, some of them are searchable, some not.
I want to get an overview of searchables/non searcables, not by opening each file manually and check if its searchable.

The main goal behind this question is that I have lots of pdf files and I want to make them all searchable but to do that I first need to have an overview of which pdf files are already searchable and which not.

Is XYplorer able to help me?

Thanks,
Mathijs

notabot
Posts: 60
Joined: 24 Feb 2021 12:34

Re: mark searchable PDF files

Post by notabot »

I thought your question was already answered nicely on the Total Commander forum?
(short answer: Use pdfOCR)


I wrote something for that a while ago. With some minor modifications, that could be adapted to your use case and generate a list of filenames )including path) that need OCR.

highend
Posts: 14566
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: mark searchable PDF files

Post by highend »

Get the Xpdf command line tools and use a custom column like:

Code: Select all

$tool = "D:\some path\Xpdf tools_x64\bin64\pdftotext.exe";
    $output = trim(runret("""$tool"" -simple -nopgbrk ""<cc_item>"" -", %TEMP%, 65001), <crlf>, "R");
    if ($output) { return "S"; }
Why XY's internal sc extracttext() is not used? Because it can throw an unavoidable script error on non-searchable pdfs...
One of my scripts helped you out? Please donate via Paypal

Online
Horst
Posts: 1329
Joined: 24 Jan 2021 12:27
Location: Germany

Re: mark searchable PDF files

Post by Horst »

highend wrote: 03 Mar 2021 19:25 Get the Xpdf command line tools and use a custom column like:

Code: Select all

$tool = "D:\some path\Xpdf tools_x64\bin64\pdftotext.exe";
    $output = trim(runret("""$tool"" -simple -nopgbrk ""<cc_item>"" -", %TEMP%, 65001), <crlf>, "R");
    if ($output) { return "S"; }
Why XY's internal sc extracttext() is not used? Because it can throw an unavoidable script error on non-searchable pdfs...
I would like to use this script and have defined custom column 16 with it.
But I don't understand the help file on the topic "how to find files with custom columns" ?
Where in the find files fields do I have to enter cc16:s which is my understanding of what to search for.
[Edit]
I found that it works in Quick search but still would like to know how it can be done with Find files dialog ?
[Edit]
Found it by further reading the help file.
So I can enter things like !cc16:s in the name field.
I don't find this very intuitive or logical.
Windows 11 Home, Version 25H2 (OS Build 26200.7171)
Portable x64 XYplorer (Actual version, including betas)
Display settings 1920 x 1080 Scale 100%
Everything 1.5.0.1400a (x64), Everything Toolbar 2.1.0, Listary Pro 6.3.6.99

mgroen
Posts: 8
Joined: 19 May 2015 17:08

Re: mark searchable PDF files

Post by mgroen »

notabot wrote: 03 Mar 2021 19:19 I thought your question was already answered nicely on the Total Commander forum?
(short answer: Use pdfOCR)


I wrote something for that a while ago. With some minor modifications, that could be adapted to your use case and generate a list of filenames )including path) that need OCR.
My question was if XYplorer is able to do that by itself. So no plugins etc.

Online
Horst
Posts: 1329
Joined: 24 Jan 2021 12:27
Location: Germany

Re: mark searchable PDF files

Post by Horst »

mgroen wrote: 04 Mar 2021 12:13
notabot wrote: 03 Mar 2021 19:19 I thought your question was already answered nicely on the Total Commander forum?
(short answer: Use pdfOCR)


I wrote something for that a while ago. With some minor modifications, that could be adapted to your use case and generate a list of filenames )including path) that need OCR.
My question was if XYplorer is able to do that by itself. So no plugins etc.
The same answer as you got in Total Commander forum.
Why should a file manager have such a special function in native code ?
And whats the problem using highends script in XYplorer, it works perfectly for me.
Windows 11 Home, Version 25H2 (OS Build 26200.7171)
Portable x64 XYplorer (Actual version, including betas)
Display settings 1920 x 1080 Scale 100%
Everything 1.5.0.1400a (x64), Everything Toolbar 2.1.0, Listary Pro 6.3.6.99

highend
Posts: 14566
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: mark searchable PDF files

Post by highend »

I can't see any specific "XY should handle this without external tools" part in the initial question...

Apart from that, if Don implements: viewtopic.php?f=5&t=22805

you can use extracttext() instead of the external xpdf tool...
One of my scripts helped you out? Please donate via Paypal

mgroen
Posts: 8
Joined: 19 May 2015 17:08

Re: mark searchable PDF files

Post by mgroen »

The same answer as you got in Total Commander forum.
Why should a file manager have such a special function in native code ?
And whats the problem using highends script in XYplorer, it works perfectly for me.
Because its about FILES and we are talking about a FILE manager.

Scripting is very cumbersome if application has the functionality built in. Also very error prown, thats why I am seeking for a good file manager which this functionality built in. Also, TotalCommander script is not usable for XYplorer? (but this is a question from my side)

Online
Horst
Posts: 1329
Joined: 24 Jan 2021 12:27
Location: Germany

Re: mark searchable PDF files

Post by Horst »

mgroen wrote: 04 Mar 2021 14:14
The same answer as you got in Total Commander forum.
Why should a file manager have such a special function in native code ?
And whats the problem using highends script in XYplorer, it works perfectly for me.
Because its about FILES and we are talking about a FILE manager.

Scripting is very cumbersome if application has the functionality built in. Also very error prown, thats why I am seeking for a good file manager which this functionality built in. Also, TotalCommander script is not usable for XYplorer? (but this is a question from my side)
There is currently no script for this purpose in Total Commander.
But the script above from highend shows that can do such scripting easier in XYplorer.
Windows 11 Home, Version 25H2 (OS Build 26200.7171)
Portable x64 XYplorer (Actual version, including betas)
Display settings 1920 x 1080 Scale 100%
Everything 1.5.0.1400a (x64), Everything Toolbar 2.1.0, Listary Pro 6.3.6.99

highend
Posts: 14566
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: mark searchable PDF files

Post by highend »

And now with v21.50.0130

you can just use:
$output = trim(extracttext(<cc_item>, , 1), <crlf>, "R");
return $output ? "S" : "";
One of my scripts helped you out? Please donate via Paypal

Online
Horst
Posts: 1329
Joined: 24 Jan 2021 12:27
Location: Germany

Re: mark searchable PDF files

Post by Horst »

highend wrote: 04 Mar 2021 16:34 And now with v21.50.0130

you can just use:
$output = trim(extracttext(<cc_item>, , 1), <crlf>, "R");
return $output ? "S" : "";
This doesn't work at all for me.
Its extremly slow with SumatraPDF iFilter
and Quick find returns all my 220 tested PDF files as not searchable.
But only 26 of them are not searchable.
This is independand of the used iFilter software, tested with SumatraPDF and TETPDFiFilter.
The pdftotext solution is fast and delivers correct results independand of the used iFilter software.
Last edited by Horst on 04 Mar 2021 18:06, edited 2 times in total.
Windows 11 Home, Version 25H2 (OS Build 26200.7171)
Portable x64 XYplorer (Actual version, including betas)
Display settings 1920 x 1080 Scale 100%
Everything 1.5.0.1400a (x64), Everything Toolbar 2.1.0, Listary Pro 6.3.6.99

highend
Posts: 14566
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: mark searchable PDF files

Post by highend »

Works fine here and all pdfs are classified correctly.

What do these commands yield on one of those pdfs?

Code: Select all

text extracttext(, 32);
text extracttext(, 64);
One of my scripts helped you out? Please donate via Paypal

Online
Horst
Posts: 1329
Joined: 24 Jan 2021 12:27
Location: Germany

Re: mark searchable PDF files

Post by Horst »

highend wrote: 04 Mar 2021 17:39 Works fine here and all pdfs are classified correctly.

What do these commands yield on one of those pdfs?

Code: Select all

text extracttext(, 32);
text extracttext(, 64);
text extracttext(, 32); always gives an error.
text extracttext(, 64); always delivers an empty output regardless if the pdf is searchable or not.
Windows 11 Home, Version 25H2 (OS Build 26200.7171)
Portable x64 XYplorer (Actual version, including betas)
Display settings 1920 x 1080 Scale 100%
Everything 1.5.0.1400a (x64), Everything Toolbar 2.1.0, Listary Pro 6.3.6.99

highend
Posts: 14566
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: mark searchable PDF files

Post by highend »

Attach that pdf here (zipped)...
One of my scripts helped you out? Please donate via Paypal

Online
Horst
Posts: 1329
Joined: 24 Jan 2021 12:27
Location: Germany

Re: mark searchable PDF files

Post by Horst »

highend wrote: 04 Mar 2021 17:47 Attach that pdf here (zipped)...
Attached 2 examples.
The file "Drive Snapshot - Tips und Tricks.pdf" is searchable
The file "Drive Snapshot - Kommandozeile.pdf" is not searchable
Tested by trying to select text with SumatraPDF.
Attachments
files.zip
(968.38 KiB) Downloaded 130 times
Windows 11 Home, Version 25H2 (OS Build 26200.7171)
Portable x64 XYplorer (Actual version, including betas)
Display settings 1920 x 1080 Scale 100%
Everything 1.5.0.1400a (x64), Everything Toolbar 2.1.0, Listary Pro 6.3.6.99

Locked