[Poppler-bugs] [Bug 13729] New: Eats lots of memory with buggy CCITTFaxDecode image

Tue Dec 18 19:24:35 PST 2007

http://bugs.freedesktop.org/show_bug.cgi?id=13729

           Summary: Eats lots of memory with buggy CCITTFaxDecode image
           Product: poppler
           Version: unspecified
          Platform: Other
        OS/Version: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: cairo backend
        AssignedTo: poppler-bugs at lists.freedesktop.org
        ReportedBy: ph.silva at gmail.com

I'm trying to render the attached PDF (Evince using poppler cairo backend) and
poppler eats lots and lots of memory (around 100MB per page).

But, I don't know if Evince/poppler is wrong at all. This PDF use images in
CCITTFaxDecode format. But, the dimensions of the image are huge: 

/Type /XObject
/Subtype /Image
/Name /Im1
/Filter [ /CCITTFaxDecode ]
/Width 5120 /Height 6600 /BitsPerComponent 1
/ColorSpace /DeviceGray
/Length 5 0 R
/DecodeParms [ << /K -1 /Columns 5120 /Rows 6600 /EndOfBlock false /BlackIs1
false >> ]
>>
stream
...
endstream
endobj

5 0 obj
207775
endobj

Anyway, It looks like the dimensions was guessed by the producer 'cause the
stream doesn't have enough samples, 207775 bytes * 8 bits/byte = 1662200
samples (grayscale colorspace), roughly a 1290x1290px image, not that huge as
the guessed dict says, Is this calculation right? I'm really guessing how CCITT
works.

Note that stream filter contains "/EndOfBlock false" which may confirms that
the producer guessed the dimensions of the image.

PDF Reference about "EndOfBlock":

"A flag indicating whether the filter expects the encoded data to be          
terminated by an end-of-block pattern, overriding the Rows parameter. **If
false, the filter stops when it has decoded the number of lines indicated by
Rows or when its data has been exhausted**, whichever occurs
first."[...][emphasis added]

If the above calculation was right, I think that its just a matter of check if
the dimensions of the image agrees with the samples and not allocate a huge
buffer on CairoOutputDev.cc:1526 (SVN HEAD).

Any hint about how to implement this (assuming that my guesses was right)?

[This kind of PDF seems to be very common on Astronomy community (were I
include myself), as many older (and not so, as this is from 1994) PDF comes
with no OCR and only scanned images like this one]

-- 
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.