[poppler] Help with pdftohtml background image resolution

mpsuzuki at hiroshima-u.ac.jp mpsuzuki at hiroshima-u.ac.jp
Fri Aug 13 09:33:49 PDT 2010


Hi,

Still I'm waiting for Albert's comment, I've started
a preliminary experiment. Attached is my patch from
my working source tree. This is NOT a solution, this
is just my status report.

The effect of this patch is almost same with "-zoom"
option. I guess, you don't want to zoom the document
in the browser, you just want to have higher resolution
image on the background. Maybe we have to tune the
size described in IMG tag of generated HTML. I have
to check more detail in HTMLOutputDev.cc.

Regards,
mpsuzuki

diff --git a/utils/pdftohtml.cc b/utils/pdftohtml.cc
index 3c74c6e..a66ea49 100644
--- a/utils/pdftohtml.cc
+++ b/utils/pdftohtml.cc
@@ -64,6 +64,7 @@ GBool complexMode=gFalse;
 GBool ignore=gFalse;
 //char extension[5]=".png";
 double scale=1.5;
+int resolution=72;
 GBool noframes=gFalse;
 GBool stout=gFalse;
 GBool xml=gFalse;
@@ -107,6 +108,8 @@ static const ArgDesc argDesc[] = {
    "use standard output"},
   {"-zoom",   argFP,    &scale,         0,
    "zoom the pdf document (default 1.5)"},
+  {"-r",      argInt,   &resolution, 0,
+   "resolution to render the pdf document (default 72)"},
   {"-xml",    argFlag,    &xml,         0,
    "output for XML post-processing"},
   {"-hidden", argFlag,   &showHidden,   0,
@@ -333,7 +336,7 @@ int main(int argc, char *argv[]) {
 
   if (htmlOut->isOk())
   {
-    doc->displayPages(htmlOut, firstPage, lastPage, 72, 72, 0,
+    doc->displayPages(htmlOut, firstPage, lastPage, resolution, resolution, 0,
 		      gTrue, gFalse, gFalse);
   	if (!xml)
 	{
@@ -353,11 +356,11 @@ int main(int argc, char *argv[]) {
     psOut = new PSOutputDev(psFileName->getCString(), doc->getXRef(),
 			    doc->getCatalog(), NULL, firstPage, lastPage, psModePS, w, h);
     psOut->setDisplayText(gFalse);
-    doc->displayPages(psOut, firstPage, lastPage, 72, 72, 0,
+    doc->displayPages(psOut, firstPage, lastPage, resolution, resolution, 0,
 		      gTrue, gFalse, gFalse);
     delete psOut;
 
-    /*sprintf(buf, "%s -sDEVICE=png16m -dBATCH -dNOPROMPT -dNOPAUSE -r72 -sOutputFile=%s%%03d.png -g%dx%d -q %s", GHOSTSCRIPT, htmlFileName->getCString(), w, h,
+    /*sprintf(buf, "%s -sDEVICE=png16m -dBATCH -dNOPROMPT -dNOPAUSE -r%d -sOutputFile=%s%%03d.png -g%dx%d -q %s", GHOSTSCRIPT, resolution, htmlFileName->getCString(), w, h,
       psFileName->getCString());*/
     
     GooString *gsCmd = new GooString(GHOSTSCRIPT);
@@ -365,7 +368,7 @@ int main(int argc, char *argv[]) {
     gsCmd->append(" -sDEVICE=");
 	gsCmd->append(gsDevice);
 	gsCmd->append(" -dBATCH -dNOPROMPT -dNOPAUSE -r");
-    sc = GooString::fromInt(static_cast<int>(72*scale));
+    sc = GooString::fromInt(static_cast<int>(resolution*scale));
     gsCmd->append(sc);
     gsCmd->append(" -sOutputFile=");
     gsCmd->append("\"");
@@ -373,10 +376,10 @@ int main(int argc, char *argv[]) {
     gsCmd->append("%03d.");
 	gsCmd->append(extension);
 	gsCmd->append("\" -g");
-    tw = GooString::fromInt(static_cast<int>(scale*w));
+    tw = GooString::fromInt(static_cast<int>(scale*w*resolution/72.0));
     gsCmd->append(tw);
     gsCmd->append("x");
-    th = GooString::fromInt(static_cast<int>(scale*h));
+    th = GooString::fromInt(static_cast<int>(scale*h*resolution/72.0));
     gsCmd->append(th);
     gsCmd->append(" -q \"");
     gsCmd->append(psFileName);

On Sat, 14 Aug 2010 00:24:22 +0800
ChunWei Ho <fuzzybr80 at gmail.com> wrote:
>Thanks mpsuzuki, for looking at this. I appreciate your help, and
>hopefully the information here adds to it.
>
>You are right that gs is used for the images:
>The command is generally:
>running: gs -sDEVICE=png16m -dBATCH -dNOPROMPT -dNOPAUSE -r<R>
>-sOutputFile="/root/test/ss3%03d.png" -g<X>x<Y> -q "/root/test/ss3.ps"
>
>where <R> = 72 * scale
>and <X> and <Y> are almost always 595x842 despite what scale you pick.
>Its strange - these are the values of htmlOut->getPageWidth() and
>htmlOut->getPageHeight (I don't know where they are set). They are
>divided by scale to form w and h, then multiplied back with scale to
>form tw and th (which are used in the command line).
>
>I've tried adjusting those values there but it appears to distort the
>output. I'm not a graphics person (so I have no idea how the -r value
>fits in with the -gXxY value).
>
>Do you see a way to change the basic htmlOut page width and height
>(which appears to be the arbitrary limit here).
>
>Thanks!
>
>
>
>On Fri, Aug 13, 2010 at 5:29 PM,  <mpsuzuki at hiroshima-u.ac.jp> wrote:
>> Hi,
>>
>> I'm sorry for too late joining to this disucssion.
>> As ChunWei Ho had already filed this issue in bugzilla,
>> should I discuss in there? If so, please let me know.
>>
>> Taking a glance on utils/pdftohtml.cc, sorry this is my
>> first observation of it, I found that pdftohtml does not
>> make the images by poppler. pdftohtml makes the text-based
>> part by HtmlOutputDev of the poppler, but the image parts
>> are created by running external Ghostscript.
>>
>> And, the resolution seems to be 72dpi x scaling parameter
>> (given by zoom).
>>
>>
>> So, changing around this part may work to obtain high resolution
>> background image. Albert, please give me your comment if it's
>> right direction. I will work to add new option "-r" to modify
>> the default resolution to be passed to Ghostscript.
>>
>> I wish if I had sufficient sparetime to replace the background
>> image part from Ghostscript to poppler, but now I don't have...
>>
>> Regards,
>> mpsuzuki
>>
>> On Fri, 13 Aug 2010 15:30:57 +0800
>> ChunWei Ho <fuzzybr80 at gmail.com> wrote:
>>
>>>I also tried poppler-0.5.91 (earliest that builds for me), but that
>>>has the same issue. I tried looking into/diffing the code but not
>>>seeing an obvious fix/issue there.
>>>
>>>I've logged a bug at https://bugs.freedesktop.org/show_bug.cgi?id=29551
>>>
>>>Its probably not affecting too many users, but I appreciate if it can
>>>be investigated soon as it would be great to be able to deploy
>>>poppler-utils for our purposes.
>>>
>>>Thanks.
>>>
>>>>>> I've been using pdftohtml (http://pdftohtml.sourceforge.net/) for PDF
>>>>>> to HTML conversion for my application, and recently tried to upgrade
>>>>>> it to use poppler-utils.
>>>>>> I usually invoke it as "pdftohtml -c -noframes [input pdf] [output html]"
>>>>>> The commandline interface and all is fine but the images (I understand
>>>>>> a background image is generated per page) is now really bad. I did a
>>>>>> check and under the old pdftohtml project, each background image (PNG)
>>>>>> for a page is 1785x2526 resolution.
>>>>>>
>>>>>> Under poppler-utils, each background image (PNG) is 594x843 resolution.
>>>>>>
>>>>>> Can someone point me in the right direction to change/fix this? There
>>>>>> doesn't appear to be a command line parameter for this.
>>>>>> The new background images are bad to the extent of unusable. Which is
>>>>>> a shame, because I really want to move to poppler-utils for the
>>>>>> unicode and continued support.
>>>>
>>>>>Which poppler version are you using?
>>>>
>>>>>Albert
>>>>
>>>_______________________________________________
>>>poppler mailing list
>>>poppler at lists.freedesktop.org
>>>http://lists.freedesktop.org/mailman/listinfo/poppler
>>


More information about the poppler mailing list