[poppler] Some changes for util/pdftohtml

Warren Toomey poppler at tuhs.org
Mon Sep 29 17:01:35 PDT 2008


Albert Astals Cid <aacid at kde.org> wrote:
> 
> A Dilluns 29 Setembre 2008, Warren Toomey va escriure:
> > I found a bug in utils/pdftohtml which prevented it from extracting JPEGs
> > from PDF documents. Around line 231, this line:
> > 	virtual GBool needNonText() { return gFalse; }
> > needs to have gFalse changed to gTrue.
> 
> Right, wonder when it broke :-/
> Also i think pdftohtml should create images for all the non DCT images, it 
> should not be hard to create an uncompressed BMP or something similar with 
> the raw RGB data.

The changes that I made include the code from utils/pdfimages to output PPM
images for the non DCT images.

> > I've also made some other more significant changes to utils/pdftohtml,
> > but they significantly alter the output format.
> 
> Yeah please, do not put a tarball, a diff is much better.

I guess my changes are "proof of concept" at the moment. The output change
is significant enough that the poppler maintainers need to decide whether
to adopt it or not, or perhaps add a new run-time option so that the user
can select either output.

> Your patches come in a "bad moment" since we are releasing a new stable 
> version in one week or so, and such a big feature change is not a good idea, 
> but we may speak of including them for the next feature release.

Quite understandable. What I might do then is to package up any bugs I've
found into a diff that can be applied in the near future. Once the next
stable version is out and the maintainers have made a decision on the output
change (adopt/reject/selectable at run-time), I can work on a cleaner version
and then submit a diff which can be applied.

As an end-user of libpoppler up to now, can I ask if there is a test suite
of PDF documents that are used for testing purposes? I've been used only
a handful of documents for testing at this end, but if there is a standard
set of documents, then I should use them as well.

Many thanks for the feedback. I'll get a diff with the uncontroversial
changes in soon.

	Warren


More information about the poppler mailing list