Extract images from .pdf

Discuss and share scripts and script files...
Post Reply
highend
Posts: 13274
Joined: 06 Feb 2011 00:33

Extract images from .pdf

Post by highend »

This is an alternative version to the one from 1024mb (viewtopic.php?t=25770)

It uses poppler's pdfimages.exe command line tool to extract the images (https://github.com/oschwartz10612/poppl ... s/releases)

It's a single .xys script, it doesn't require anything else apart from pdfimages.exe

This is the configuration section of the script:

Code: Select all

    // Set the full path to "pdfimages.exe"
    $pdfimages = "D:\Tools\@Command Line Tools\poppler_x64\pdfimages.exe";

    // Image name structure: "pg. 01 - #001.jpg"
    $frontName  = "pg. ";
    $middleName = " - #";

    // Keep stencil / mask file(s)
    // Default = false, can be set to: true
    $keepStencils = false;
    $keepSMasks   = false;
It automatically takes all .pdf files in the current path to extract them.

Why did I write it when there is already an existing version?

I've found pdfimages.exe way more reliable than the mutool.exe version.

I've used e.g.: https://www.phanteks.com/assets/manuals ... 00PSTG.pdf
for testing it.

mutool.exe extracts 73 items, only 43 of them are images.
pdfimages.exe extracts 127 images (in fact it does 190, but with $keepStencils and $keepSMasks set to false it removes those unwanted items)

Current version: v0.1
Extract images from pdf v0.1.xys
(6.33 KiB) Downloaded 44 times
One of my scripts helped you out? Please donate via Paypal

klownboy
Posts: 4109
Joined: 28 Feb 2012 19:27

Re: Extract images from .pdf

Post by klownboy »

Thanks for the script highend. I take it, it must depend on how the picture is incorporated or embedded into the PDF and what kind of image it is (e.g., jpg or something else) as to whether an image can be extracted. I ask because I did have a few PDFs that I received a message that no images are attached to the PDF file. Also, it seems you do need the whole package in addition to the one exe file "pdfimages.exe". Without the other files, I received the "no images attached" on every PDF file. Thanks again.
Windows 11, 22H2 Build 22621.1555 at 100% 2560x1440

highend
Posts: 13274
Joined: 06 Feb 2011 00:33

Re: Extract images from .pdf

Post by highend »

Hi Ken,

are you sure those .pdfs were not password protected?

Regarding necessary files, from my experience, the necessary ones are only those:

Code: Select all

# Required for pdfimages.exe
deflate.dll
freetype.dll
lcms2.dll
Lerc.dll
libcrypto-3-x64.dll
libcurl.dll
liblzma.dll
libpng16.dll
libssh2.dll
libtiff.dll
openjp2.dll
pdfimages.exe
poppler.dll
tiff.dll
zlib.dll
zstd.dll

# Eventually:
msvcp140.dll
vcruntime140.dll
vcruntime140_1.dll
One of my scripts helped you out? Please donate via Paypal

klownboy
Posts: 4109
Joined: 28 Feb 2012 19:27

Re: Extract images from .pdf

Post by klownboy »

Thanks highend for the support files list. I think I'll just keep them all from the zip for now. In using the script again, I'm not getting the message that I was before. It was probably because I was running the script referencing the solo exe location instead of the location with the entire package. So, all is good. Thanks.
Windows 11, 22H2 Build 22621.1555 at 100% 2560x1440

Post Reply