[poppler] pdftohtml image quality

Josh Richardson jric at chegg.com
Tue Nov 22 08:42:32 PST 2011


My bad.  I forgot that was something I added that hasn't been merged back in yet.  I think your options are:

 1.  Use my version (email me offline if you want it, and I'll send you an invite to my source — it has other enhancements to pdftohtml also — read the mailing list archives for more info),
 2.  Change the source of pdftohtml.cc to make the default sampling 96 instead of 72 dpi, or
 3.  Wait for my changes to get merged back into the main repo.  I'm not sure when that's going to be done.

Best, --josh

From: Craig Whitcombe <craig.whitcombe at gmail.com<mailto:craig.whitcombe at gmail.com>>
Date: Tue, 22 Nov 2011 06:20:20 -0800
To: Josh Richardson <jric at chegg.com<mailto:jric at chegg.com>>
Cc: "poppler at lists.freedesktop.org<mailto:poppler at lists.freedesktop.org>" <poppler at lists.freedesktop.org<mailto:poppler at lists.freedesktop.org>>
Subject: Re: [poppler] pdftohtml image quality

Sorry Josh, but I cannot see this -dpi setting

pdftohtml.exe -help

pdftohtml version 0.18.0
Copyright 2005-2011 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2004 Glyph & Cog, LLC

Usage: pdftohtml [options] <PDF-file> [<html-file> <xml-file>]
  -f <int>          : first page to convert
  -l <int>          : last page to convert
  -q                : don't print any messages or errors
  -h                : print usage information
  -help             : print usage information
  -p                : exchange .pdf links by .html
  -c                : generate complex document
  -s                : generate single document that includes all pages
  -i                : ignore images
  -noframes         : generate no frames
  -stdout           : use standard output
  -zoom <fp>        : zoom the pdf document (default 1.5)
  -xml              : output for XML post-processing
  -hidden           : output hidden text
  -nomerge          : do not merge paragraphs
  -enc <string>     : output text encoding name
  -dev <string>     : output device name for Ghostscript (png16m, jpeg etc)
  -fmt <string>     : image file format for Splash output (png or jpg)
  -v                : print copyright and version info
  -opw <string>     : owner password (for encrypted files)
  -upw <string>     : user password (for encrypted files)
  -nodrm            : override document DRM settings


trying to use -dpi 96 anyway results in the above help message.

Regards,
Craig





On 22 November 2011 06:45, Josh Richardson <jric at chegg.com<mailto:jric at chegg.com>> wrote:
By default pdftohtml is sampling the original image at 72 dpi, whereas your browser is probably displaying it at least 96 dpi.  I recommend you try bumping up the –dpi parameter.

--josh

From: Craig Whitcombe <craig.whitcombe at gmail.com<mailto:craig.whitcombe at gmail.com>>
Date: Sun, 20 Nov 2011 08:02:39 -0800
To: "poppler at lists.freedesktop.org<mailto:poppler at lists.freedesktop.org>" <poppler at lists.freedesktop.org<mailto:poppler at lists.freedesktop.org>>
Subject: [poppler] pdftohtml image quality

Hello,

Using pdftohtml -c to create a complex document from a pdf, I find that the generated png images are not very good when compared to the original inside the source pdf.

Is there something that I can do to improve the output quality?

Using version 0.18 with pdftohtml -c somepdf.pdf
Regards,
Craig

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20111122/ecab0ea0/attachment.html>


More information about the poppler mailing list