I'll take option 3 for now, this is only a minor annoyance.<div><br></div><div>Cheers,</div><div>Craig<br><br><div class="gmail_quote">On 22 November 2011 17:42, Josh Richardson <span dir="ltr"><<a href="mailto:jric@chegg.com">jric@chegg.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div style="word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif"><div>My bad. I forgot that was something I added that hasn't been merged back in yet. I think your options are:</div>
<ol><li>Use my version (email me offline if you want it, and I'll send you an invite to my source — it has other enhancements to pdftohtml also — read the mailing list archives for more info),</li><li>Change the source of pdftohtml.cc to make the default sampling 96 instead of 72 dpi, or</li>
<li>Wait for my changes to get merged back into the main repo. I'm not sure when that's going to be done.</li></ol><div>Best, --josh</div><div><br></div><span><div style="font-family:Calibri;font-size:11pt;text-align:left;color:black;BORDER-BOTTOM:medium none;BORDER-LEFT:medium none;PADDING-BOTTOM:0in;PADDING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df 1pt solid;BORDER-RIGHT:medium none;PADDING-TOP:3pt">
<span style="font-weight:bold">From: </span> Craig Whitcombe <<a href="mailto:craig.whitcombe@gmail.com" target="_blank">craig.whitcombe@gmail.com</a>><br><span style="font-weight:bold">Date: </span> Tue, 22 Nov 2011 06:20:20 -0800<br>
<span style="font-weight:bold">To: </span> Josh Richardson <<a href="mailto:jric@chegg.com" target="_blank">jric@chegg.com</a>><br><span style="font-weight:bold">Cc: </span> "<a href="mailto:poppler@lists.freedesktop.org" target="_blank">poppler@lists.freedesktop.org</a>" <<a href="mailto:poppler@lists.freedesktop.org" target="_blank">poppler@lists.freedesktop.org</a>><br>
<span style="font-weight:bold">Subject: </span> Re: [poppler] pdftohtml image quality<br></div><div><div class="h5"><div><br></div>Sorry Josh, but I cannot see this -dpi setting<div><br></div><div><div>pdftohtml.exe -help</div>
<div><br></div><div>pdftohtml version 0.18.0</div><div>Copyright 2005-2011 The Poppler Developers - <a href="http://poppler.freedesktop.org" target="_blank">http://poppler.freedesktop.org</a></div><div>Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch</div>
<div>Copyright 1996-2004 Glyph & Cog, LLC</div><div><br></div><div>Usage: pdftohtml [options] <PDF-file> [<html-file> <xml-file>]</div><div> -f <int> : first page to convert</div><div>
-l <int> : last page to convert</div><div> -q : don't print any messages or errors</div><div> -h : print usage information</div><div> -help : print usage information</div>
<div> -p : exchange .pdf links by .html</div><div> -c : generate complex document</div><div> -s : generate single document that includes all pages</div><div> -i : ignore images</div>
<div> -noframes : generate no frames</div><div> -stdout : use standard output</div><div> -zoom <fp> : zoom the pdf document (default 1.5)</div><div> -xml : output for XML post-processing</div>
<div> -hidden : output hidden text</div><div> -nomerge : do not merge paragraphs</div><div> -enc <string> : output text encoding name</div><div> -dev <string> : output device name for Ghostscript (png16m, jpeg etc)</div>
<div> -fmt <string> : image file format for Splash output (png or jpg)</div><div> -v : print copyright and version info</div><div> -opw <string> : owner password (for encrypted files)</div>
<div> -upw <string> : user password (for encrypted files)</div><div> -nodrm : override document DRM settings</div><div><br></div><div><br></div><div>trying to use -dpi 96 anyway results in the above help message.</div>
<div><br></div><div>Regards,</div><div>Craig</div><div><br></div><div><br></div><div><br></div><div><br></div><br><div class="gmail_quote">
On 22 November 2011 06:45, Josh Richardson <span dir="ltr"><<a href="mailto:jric@chegg.com" target="_blank">jric@chegg.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif"><div>By default pdftohtml is sampling the original image at 72 dpi, whereas your browser is probably displaying it at least 96 dpi. I recommend you try bumping up the –dpi parameter.</div>
<div><br></div><div>--josh</div><div><br></div><span><div style="font-family:Calibri;font-size:11pt;text-align:left;color:black;BORDER-BOTTOM:medium none;BORDER-LEFT:medium none;PADDING-BOTTOM:0in;PADDING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df 1pt solid;BORDER-RIGHT:medium none;PADDING-TOP:3pt">
<span style="font-weight:bold">From: </span> Craig Whitcombe <<a href="mailto:craig.whitcombe@gmail.com" target="_blank">craig.whitcombe@gmail.com</a>><br><span style="font-weight:bold">Date: </span> Sun, 20 Nov 2011 08:02:39 -0800<br>
<span style="font-weight:bold">To: </span> "<a href="mailto:poppler@lists.freedesktop.org" target="_blank">poppler@lists.freedesktop.org</a>" <<a href="mailto:poppler@lists.freedesktop.org" target="_blank">poppler@lists.freedesktop.org</a>><br>
<span style="font-weight:bold">Subject: </span> [poppler] pdftohtml image quality<br></div><div><div><div><br></div>Hello,<div><br></div><div>Using pdftohtml -c to create a complex document from a pdf, I find that the generated png images are not very good when compared to the original inside the source pdf.</div>
<div><br></div><div>Is there something that I can do to improve the output quality?</div><div><br></div><div>Using version 0.18 with pdftohtml -c somepdf.pdf</div><div>Regards,</div><div>Craig</div></div></div></span></div>
</blockquote></div><br></div></div></div></span></div>
</blockquote></div><br></div>