[poppler] Black boxes & Poppler

Sat Feb 18 08:09:33 PST 2012

El Dimecres, 15 de febrer de 2012, a les 09:15:44, vau escriure:
> Am 14.02.2012 23:17, schrieb Albert Astals Cid:
> > El Dimarts, 14 de febrer de 2012, a les 20:35:29, Thomas Freitag va 
escriure:
> >> Hi Albert!
> >> 
> >> Am 14.02.2012 18:07, schrieb Ralph Gootee:
> >>> Hi Thomas!
> >>> 
> >>> Thanks for the help!
> >>> 
> >>> Steps to reproduce
> >>> 
> >>> 1) split the pdf with pdfseparate
> >>> 2) use pdftopdf to convert the output to png
> >>> 
> >>> Also, the PDF errors out acrobat after separation.  It's a little
> >>> confusing but there's already black boxes in the pdf (from
> >>> redaction)
> >>> the black boxes will show up in the middle after pdftoppm.
> >>> 
> >>> We're really really happy with poppler, thanks for helping to make
> >>> such
> >>> an awesome lib!
> >> 
> >> We have two problems with it, one is a general problem coming from the
> >> merge:
> >> 
> >> a) xRef->getNumObjects() will no more work with the changes from
> >> our/my
> >> merge in PDFDoc::writePageObjects, 'cause last is not set here. We
> >> need
> >> to use xRef->getSize().
> > 
> > We use getNumObjects in a lot of other places, aren't those affected
> > too?
> > Shouldn't we just revert getNumObjects to do what it did? i.e. kill the
> > last variable and just return size? What's the benefit of this last
> > variable?
> in the other places getNumOnjects() will work: last is filled during
> creating the XRef table in readXRefTable, and has the number of the last
> valid xref, where as size is the number of allocated xrefs.
> This optimization is coming from the merge. The problem in
> writePageObjects is the the xref table wasn't read but is indirect
> created, therefore we must use size there, also because xrefs are here
> not created necessary in ascending order.
> In short terms: in all other cases it is okay to use getNumOnjects.

Right, but to be honest getNumObjects vs getSize makes not very much sense 
here unless you can guarantee that both are always correct, and we are not 
guaranteeing that with getNumObjects, so i think I am going back to the code 
we had in poppler-0-18 were getNumObjects and getSize are the same, because 
our API is hard to use already to have such a special corner cases.

Anyway it's not like those functions are used in hot paths in which parsing a 
few more or less iterations matter.

Cheers,
  Albert

> 
> Thomas
> 
> > Albert
> > 
> >> b) CCITTFaxStream and DCTStream are enherited by FlateStream, and the
> >> FlateStream::reset "eats" the first two bytes. Therefore a call of
> >> unfilteredReset will not work in pdfseparate. As far as I can see,
> >> unfilteredReset is just called by PDFDoc::writeRawStream (or in
> >> Stream.cc itself), therefore I think my changes in the attached patch
> >> are safe.
> >> 
> >> a) is the reason why I send this patch immediately and do not wait
> >> until
> >> the weekend: pdfseparate and pdfunite will no more work on the HEAD
> >> revision.
> >> 
> >> @Ralph: You need to check out the head revision and apply this patch,
> >> if
> >> You want to test it immediately.
> >> 
> >> Cheers,
> >> Thomas
> >> 
> >>> Cheers,
> >>> Ralph G.
> > 
> > _______________________________________________
> > poppler mailing list
> > poppler at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/poppler
> > 
> > .