[Poppler-bugs] [Bug 48270] New: pdftohtml converts images to garbled "solarized" jpegs

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Tue Apr 3 16:03:11 PDT 2012


https://bugs.freedesktop.org/show_bug.cgi?id=48270

             Bug #: 48270
           Summary: pdftohtml converts images to garbled "solarized" jpegs
    Classification: Unclassified
           Product: poppler
           Version: unspecified
          Platform: x86-64 (AMD64)
        OS/Version: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: pdftohtml
        AssignedTo: poppler-bugs at lists.freedesktop.org
        ReportedBy: info at skierpage.com


Created attachment 59456
  --> https://bugs.freedesktop.org/attachment.cgi?id=59456
garbled-color image produced by pdftohtml

pdftohtml version 0.18.4 from Kubuntu 12.04 beta amd64.

I downloaded the 14MB PDF at
http://www.swanyretail.com/SwanySkiCatalog_final-LO.pdf and ran it through
pdftohtml with no options.
All the resulting jpegs are the right size but have garbled colors. They look
like "solarized" negatives: mostly black, little color. 

To reproduce the problem,
  mkdir bugtemp
  cd bugtemp
  wget http://www.swanyretail.com/SwanySkiCatalog_final-LO.pdf
  pdftohtml -f 1 -l 3 SwanySkiCatalog_final-LO.pdf swany.html'
to convert just the first three pages.  Then look at the resulting swanys.html
and/or the individual jpegs.

Here's the first bad image in the original PDF:

4618 0 obj
<</Intent/RelativeColorimetric/Subtype/Image/Length
88781/Filter/DCTDecode/Name/X/BitsPerComponent 8/ColorSpace/DeviceCMYK/Width
629/Height 814/Type/XObject>>stream
ÿØÿî^@^NAdobe^@d<80>^@^@^@^BÿÛ^@<84>^@^L^H^H^H^H^H^L^H^H^L^P^K^K^K^P^T^N^M^M^N^T^X^R^S^S^S^R^X^T^R^T^T^T^T^R^T^T^[^^^^^^^[^T$''''$25552;;;;;;;;;;^A^M^

I'm guessing, perhaps pdftohtml doesn't handle ColorSpace DeviceCMYK ?

pdfimages extracts these as .ppm files that preview fine in Gwenview.
pdfimages' -j option does nothing.

I'll attach the bad jpeg from pdftohtml and the good ppm from pdfimages, and
pages 1-3 extracted with pdfseparate/pdfunite

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the Poppler-bugs mailing list