[poppler] Help with pdftohtml background image resolution

mpsuzuki at hiroshima-u.ac.jp mpsuzuki at hiroshima-u.ac.jp
Fri Aug 13 10:06:33 PDT 2010


Hi,

Please cancel the patch in my previous message.
"-r" option by attached patch does not zoom the document
and just generates higher resolution image. Please compare
the document generated by "-r 72" and "-r 600".

Anyway, even if we have high resolution PNG image for
background, it does not mean that we can see better
result via HTML browser. The HTML browser scales the
higher resolution image by its scaler, and sometimes
scaled background looks ugly.

Regards,
mpsuzuki

diff --git a/utils/pdftohtml.cc b/utils/pdftohtml.cc
index 3c74c6e..89afe26 100644
--- a/utils/pdftohtml.cc
+++ b/utils/pdftohtml.cc
@@ -64,6 +64,7 @@ GBool complexMode=gFalse;
 GBool ignore=gFalse;
 //char extension[5]=".png";
 double scale=1.5;
+int resolution=72;
 GBool noframes=gFalse;
 GBool stout=gFalse;
 GBool xml=gFalse;
@@ -107,6 +108,8 @@ static const ArgDesc argDesc[] = {
    "use standard output"},
   {"-zoom",   argFP,    &scale,         0,
    "zoom the pdf document (default 1.5)"},
+  {"-r",      argInt,   &resolution, 0,
+   "resolution to render the pdf document (default 72)"},
   {"-xml",    argFlag,    &xml,         0,
    "output for XML post-processing"},
   {"-hidden", argFlag,   &showHidden,   0,
@@ -357,7 +360,7 @@ int main(int argc, char *argv[]) {
 		      gTrue, gFalse, gFalse);
     delete psOut;
 
-    /*sprintf(buf, "%s -sDEVICE=png16m -dBATCH -dNOPROMPT -dNOPAUSE -r72 -sOutputFile=%s%%03d.png -g%dx%d -q %s", GHOSTSCRIPT, htmlFileName->getCString(), w, h,
+    /*sprintf(buf, "%s -sDEVICE=png16m -dBATCH -dNOPROMPT -dNOPAUSE -r%d -sOutputFile=%s%%03d.png -g%dx%d -q %s", GHOSTSCRIPT, resolution, htmlFileName->getCString(), w, h,
       psFileName->getCString());*/
     
     GooString *gsCmd = new GooString(GHOSTSCRIPT);
@@ -365,7 +368,7 @@ int main(int argc, char *argv[]) {
     gsCmd->append(" -sDEVICE=");
 	gsCmd->append(gsDevice);
 	gsCmd->append(" -dBATCH -dNOPROMPT -dNOPAUSE -r");
-    sc = GooString::fromInt(static_cast<int>(72*scale));
+    sc = GooString::fromInt(static_cast<int>(resolution*scale));
     gsCmd->append(sc);
     gsCmd->append(" -sOutputFile=");
     gsCmd->append("\"");
@@ -373,10 +376,10 @@ int main(int argc, char *argv[]) {
     gsCmd->append("%03d.");
 	gsCmd->append(extension);
 	gsCmd->append("\" -g");
-    tw = GooString::fromInt(static_cast<int>(scale*w));
+    tw = GooString::fromInt(static_cast<int>(scale*w*resolution/72.0));
     gsCmd->append(tw);
     gsCmd->append("x");
-    th = GooString::fromInt(static_cast<int>(scale*h));
+    th = GooString::fromInt(static_cast<int>(scale*h*resolution/72.0));
     gsCmd->append(th);
     gsCmd->append(" -q \"");
     gsCmd->append(psFileName);



On Sat, 14 Aug 2010 01:33:49 +0900
mpsuzuki at hiroshima-u.ac.jp wrote:

>Hi,
>
>Still I'm waiting for Albert's comment, I've started
>a preliminary experiment. Attached is my patch from
>my working source tree. This is NOT a solution, this
>is just my status report.
>
>The effect of this patch is almost same with "-zoom"
>option. I guess, you don't want to zoom the document
>in the browser, you just want to have higher resolution
>image on the background. Maybe we have to tune the
>size described in IMG tag of generated HTML. I have
>to check more detail in HTMLOutputDev.cc.
>
>Regards,
>mpsuzuki
>
>diff --git a/utils/pdftohtml.cc b/utils/pdftohtml.cc
>index 3c74c6e..a66ea49 100644
>--- a/utils/pdftohtml.cc
>+++ b/utils/pdftohtml.cc
>@@ -64,6 +64,7 @@ GBool complexMode=gFalse;
> GBool ignore=gFalse;
> //char extension[5]=".png";
> double scale=1.5;
>+int resolution=72;
> GBool noframes=gFalse;
> GBool stout=gFalse;
> GBool xml=gFalse;
>@@ -107,6 +108,8 @@ static const ArgDesc argDesc[] = {
>    "use standard output"},
>   {"-zoom",   argFP,    &scale,         0,
>    "zoom the pdf document (default 1.5)"},
>+  {"-r",      argInt,   &resolution, 0,
>+   "resolution to render the pdf document (default 72)"},
>   {"-xml",    argFlag,    &xml,         0,
>    "output for XML post-processing"},
>   {"-hidden", argFlag,   &showHidden,   0,
>@@ -333,7 +336,7 @@ int main(int argc, char *argv[]) {
> 
>   if (htmlOut->isOk())
>   {
>-    doc->displayPages(htmlOut, firstPage, lastPage, 72, 72, 0,
>+    doc->displayPages(htmlOut, firstPage, lastPage, resolution, resolution, 0,
> 		      gTrue, gFalse, gFalse);
>   	if (!xml)
> 	{
>@@ -353,11 +356,11 @@ int main(int argc, char *argv[]) {
>     psOut = new PSOutputDev(psFileName->getCString(), doc->getXRef(),
> 			    doc->getCatalog(), NULL, firstPage, lastPage, psModePS, w, h);
>     psOut->setDisplayText(gFalse);
>-    doc->displayPages(psOut, firstPage, lastPage, 72, 72, 0,
>+    doc->displayPages(psOut, firstPage, lastPage, resolution, resolution, 0,
> 		      gTrue, gFalse, gFalse);
>     delete psOut;
> 
>-    /*sprintf(buf, "%s -sDEVICE=png16m -dBATCH -dNOPROMPT -dNOPAUSE -r72 -sOutputFile=%s%%03d.png -g%dx%d -q %s", GHOSTSCRIPT, htmlFileName->getCString(), w, h,
>+    /*sprintf(buf, "%s -sDEVICE=png16m -dBATCH -dNOPROMPT -dNOPAUSE -r%d -sOutputFile=%s%%03d.png -g%dx%d -q %s", GHOSTSCRIPT, resolution, htmlFileName->getCString(), w, h,
>       psFileName->getCString());*/
>     
>     GooString *gsCmd = new GooString(GHOSTSCRIPT);
>@@ -365,7 +368,7 @@ int main(int argc, char *argv[]) {
>     gsCmd->append(" -sDEVICE=");
> 	gsCmd->append(gsDevice);
> 	gsCmd->append(" -dBATCH -dNOPROMPT -dNOPAUSE -r");
>-    sc = GooString::fromInt(static_cast<int>(72*scale));
>+    sc = GooString::fromInt(static_cast<int>(resolution*scale));
>     gsCmd->append(sc);
>     gsCmd->append(" -sOutputFile=");
>     gsCmd->append("\"");
>@@ -373,10 +376,10 @@ int main(int argc, char *argv[]) {
>     gsCmd->append("%03d.");
> 	gsCmd->append(extension);
> 	gsCmd->append("\" -g");
>-    tw = GooString::fromInt(static_cast<int>(scale*w));
>+    tw = GooString::fromInt(static_cast<int>(scale*w*resolution/72.0));
>     gsCmd->append(tw);
>     gsCmd->append("x");
>-    th = GooString::fromInt(static_cast<int>(scale*h));
>+    th = GooString::fromInt(static_cast<int>(scale*h*resolution/72.0));
>     gsCmd->append(th);
>     gsCmd->append(" -q \"");
>     gsCmd->append(psFileName);
>
>On Sat, 14 Aug 2010 00:24:22 +0800
>ChunWei Ho <fuzzybr80 at gmail.com> wrote:
>>Thanks mpsuzuki, for looking at this. I appreciate your help, and
>>hopefully the information here adds to it.
>>
>>You are right that gs is used for the images:
>>The command is generally:
>>running: gs -sDEVICE=png16m -dBATCH -dNOPROMPT -dNOPAUSE -r<R>
>>-sOutputFile="/root/test/ss3%03d.png" -g<X>x<Y> -q "/root/test/ss3.ps"
>>
>>where <R> = 72 * scale
>>and <X> and <Y> are almost always 595x842 despite what scale you pick.
>>Its strange - these are the values of htmlOut->getPageWidth() and
>>htmlOut->getPageHeight (I don't know where they are set). They are
>>divided by scale to form w and h, then multiplied back with scale to
>>form tw and th (which are used in the command line).
>>
>>I've tried adjusting those values there but it appears to distort the
>>output. I'm not a graphics person (so I have no idea how the -r value
>>fits in with the -gXxY value).
>>
>>Do you see a way to change the basic htmlOut page width and height
>>(which appears to be the arbitrary limit here).
>>
>>Thanks!
>>
>>
>>
>>On Fri, Aug 13, 2010 at 5:29 PM,  <mpsuzuki at hiroshima-u.ac.jp> wrote:
>>> Hi,
>>>
>>> I'm sorry for too late joining to this disucssion.
>>> As ChunWei Ho had already filed this issue in bugzilla,
>>> should I discuss in there? If so, please let me know.
>>>
>>> Taking a glance on utils/pdftohtml.cc, sorry this is my
>>> first observation of it, I found that pdftohtml does not
>>> make the images by poppler. pdftohtml makes the text-based
>>> part by HtmlOutputDev of the poppler, but the image parts
>>> are created by running external Ghostscript.
>>>
>>> And, the resolution seems to be 72dpi x scaling parameter
>>> (given by zoom).
>>>
>>>
>>> So, changing around this part may work to obtain high resolution
>>> background image. Albert, please give me your comment if it's
>>> right direction. I will work to add new option "-r" to modify
>>> the default resolution to be passed to Ghostscript.
>>>
>>> I wish if I had sufficient sparetime to replace the background
>>> image part from Ghostscript to poppler, but now I don't have...
>>>
>>> Regards,
>>> mpsuzuki
>>>
>>> On Fri, 13 Aug 2010 15:30:57 +0800
>>> ChunWei Ho <fuzzybr80 at gmail.com> wrote:
>>>
>>>>I also tried poppler-0.5.91 (earliest that builds for me), but that
>>>>has the same issue. I tried looking into/diffing the code but not
>>>>seeing an obvious fix/issue there.
>>>>
>>>>I've logged a bug at https://bugs.freedesktop.org/show_bug.cgi?id=29551
>>>>
>>>>Its probably not affecting too many users, but I appreciate if it can
>>>>be investigated soon as it would be great to be able to deploy
>>>>poppler-utils for our purposes.
>>>>
>>>>Thanks.
>>>>
>>>>>>> I've been using pdftohtml (http://pdftohtml.sourceforge.net/) for PDF
>>>>>>> to HTML conversion for my application, and recently tried to upgrade
>>>>>>> it to use poppler-utils.
>>>>>>> I usually invoke it as "pdftohtml -c -noframes [input pdf] [output html]"
>>>>>>> The commandline interface and all is fine but the images (I understand
>>>>>>> a background image is generated per page) is now really bad. I did a
>>>>>>> check and under the old pdftohtml project, each background image (PNG)
>>>>>>> for a page is 1785x2526 resolution.
>>>>>>>
>>>>>>> Under poppler-utils, each background image (PNG) is 594x843 resolution.
>>>>>>>
>>>>>>> Can someone point me in the right direction to change/fix this? There
>>>>>>> doesn't appear to be a command line parameter for this.
>>>>>>> The new background images are bad to the extent of unusable. Which is
>>>>>>> a shame, because I really want to move to poppler-utils for the
>>>>>>> unicode and continued support.
>>>>>
>>>>>>Which poppler version are you using?
>>>>>
>>>>>>Albert
>>>>>
>>>>_______________________________________________
>>>>poppler mailing list
>>>>poppler at lists.freedesktop.org
>>>>http://lists.freedesktop.org/mailman/listinfo/poppler
>>>
>_______________________________________________
>poppler mailing list
>poppler at lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list