Page 1 of 2
mark searchable PDF files
Posted: 03 Mar 2021 18:19
by mgroen
Hi,
Is XYplorer able to set a marker so that the user is able to distinguish searchable / non searchable PDF files from each other?
For example, by creating a column "searchable" and put an S (or any marker) on each line for searchable PDF files?
(and not set a marker for non searchables)
Explanation:
I have lots of PDF files, some of them are searchable, some not.
I want to get an overview of searchables/non searcables, not by opening each file manually and check if its searchable.
The main goal behind this question is that I have lots of pdf files and I want to make them all searchable but to do that I first need to have an overview of which pdf files are already searchable and which not.
Is XYplorer able to help me?
Thanks,
Mathijs
Re: mark searchable PDF files
Posted: 03 Mar 2021 19:19
by notabot
I thought your question was already answered nicely on the
Total Commander forum?
(short answer: Use pdfOCR)
I wrote
something for that a while ago. With some minor modifications, that could be adapted to your use case and generate a list of filenames )including path) that need OCR.
Re: mark searchable PDF files
Posted: 03 Mar 2021 19:25
by highend
Get the Xpdf command line tools and use a custom column like:
Code: Select all
$tool = "D:\some path\Xpdf tools_x64\bin64\pdftotext.exe";
$output = trim(runret("""$tool"" -simple -nopgbrk ""<cc_item>"" -", %TEMP%, 65001), <crlf>, "R");
if ($output) { return "S"; }
Why XY's internal sc
extracttext() is not used? Because it can throw an unavoidable script error on non-searchable pdfs...
Re: mark searchable PDF files
Posted: 03 Mar 2021 20:19
by Horst
highend wrote: ↑03 Mar 2021 19:25
Get the Xpdf command line tools and use a custom column like:
Code: Select all
$tool = "D:\some path\Xpdf tools_x64\bin64\pdftotext.exe";
$output = trim(runret("""$tool"" -simple -nopgbrk ""<cc_item>"" -", %TEMP%, 65001), <crlf>, "R");
if ($output) { return "S"; }
Why XY's internal sc
extracttext() is not used? Because it can throw an unavoidable script error on non-searchable pdfs...
I would like to use this script and have defined custom column 16 with it.
But I don't understand the help file on the topic "how to find files with custom columns" ?
Where in the find files fields do I have to enter
cc16:s which is my understanding of what to search for.
[Edit]
I found that it works in Quick search but still would like to know how it can be done with Find files dialog ?
[Edit]
Found it by further reading the help file.
So I can enter things like
!cc16:s in the name field.
I don't find this very intuitive or logical.
Re: mark searchable PDF files
Posted: 04 Mar 2021 12:13
by mgroen
notabot wrote: ↑03 Mar 2021 19:19
I thought your question was already answered nicely on the
Total Commander forum?
(short answer: Use pdfOCR)
I wrote
something for that a while ago. With some minor modifications, that could be adapted to your use case and generate a list of filenames )including path) that need OCR.
My question was if XYplorer is able to do that by itself. So no plugins etc.
Re: mark searchable PDF files
Posted: 04 Mar 2021 12:19
by Horst
mgroen wrote: ↑04 Mar 2021 12:13
notabot wrote: ↑03 Mar 2021 19:19
I thought your question was already answered nicely on the
Total Commander forum?
(short answer: Use pdfOCR)
I wrote
something for that a while ago. With some minor modifications, that could be adapted to your use case and generate a list of filenames )including path) that need OCR.
My question was if XYplorer is able to do that by itself. So no plugins etc.
The same answer as you got in Total Commander forum.
Why should a file manager have such a special function in native code ?
And whats the problem using highends script in XYplorer, it works perfectly for me.
Re: mark searchable PDF files
Posted: 04 Mar 2021 13:08
by highend
I can't see any specific "XY should handle this without external tools" part in the initial question...
Apart from that, if Don implements:
viewtopic.php?f=5&t=22805
you can use
extracttext() instead of the external xpdf tool...
Re: mark searchable PDF files
Posted: 04 Mar 2021 14:14
by mgroen
The same answer as you got in Total Commander forum.
Why should a file manager have such a special function in native code ?
And whats the problem using highends script in XYplorer, it works perfectly for me.
Because its about FILES and we are talking about a FILE manager.
Scripting is very cumbersome if application has the functionality built in. Also very error prown, thats why I am seeking for a good file manager which this functionality built in. Also, TotalCommander script is not usable for XYplorer? (but this is a question from my side)
Re: mark searchable PDF files
Posted: 04 Mar 2021 15:57
by Horst
mgroen wrote: ↑04 Mar 2021 14:14
The same answer as you got in Total Commander forum.
Why should a file manager have such a special function in native code ?
And whats the problem using highends script in XYplorer, it works perfectly for me.
Because its about FILES and we are talking about a FILE manager.
Scripting is very cumbersome if application has the functionality built in. Also very error prown, thats why I am seeking for a good file manager which this functionality built in. Also, TotalCommander script is not usable for XYplorer? (but this is a question from my side)
There is currently no script for this purpose in Total Commander.
But the script above from highend shows that can do such scripting easier in XYplorer.
Re: mark searchable PDF files
Posted: 04 Mar 2021 16:34
by highend
And now with v21.50.0130
you can just use:
$output = trim(extracttext(<cc_item>, , 1), <crlf>, "R");
return $output ? "S" : "";
Re: mark searchable PDF files
Posted: 04 Mar 2021 17:26
by Horst
highend wrote: ↑04 Mar 2021 16:34
And now with v21.50.0130
you can just use:
$output = trim(extracttext(<cc_item>, , 1), <crlf>, "R");
return $output ? "S" : "";
This doesn't work at all for me.
Its extremly slow with SumatraPDF iFilter
and Quick find returns all my 220 tested PDF files as not searchable.
But only 26 of them are not searchable.
This is independand of the used iFilter software, tested with SumatraPDF and TETPDFiFilter.
The pdftotext solution is fast and delivers correct results independand of the used iFilter software.
Re: mark searchable PDF files
Posted: 04 Mar 2021 17:39
by highend
Works fine here and all pdfs are classified correctly.
What do these commands yield on one of those pdfs?
Code: Select all
text extracttext(, 32);
text extracttext(, 64);
Re: mark searchable PDF files
Posted: 04 Mar 2021 17:46
by Horst
highend wrote: ↑04 Mar 2021 17:39
Works fine here and all pdfs are classified correctly.
What do these commands yield on one of those pdfs?
Code: Select all
text extracttext(, 32);
text extracttext(, 64);
text extracttext(, 32); always gives an error.
text extracttext(, 64); always delivers an empty output regardless if the pdf is searchable or not.
Re: mark searchable PDF files
Posted: 04 Mar 2021 17:47
by highend
Attach that pdf here (zipped)...
Re: mark searchable PDF files
Posted: 04 Mar 2021 17:55
by Horst
highend wrote: ↑04 Mar 2021 17:47
Attach that pdf here (zipped)...
Attached 2 examples.
The file "Drive Snapshot - Tips und Tricks.pdf" is searchable
The file "Drive Snapshot - Kommandozeile.pdf" is not searchable
Tested by trying to select text with SumatraPDF.