[poppler] Poppler as a library for parsing and manipulating PDF
files?
Kristian Høgsberg
krh at bitplanet.net
Tue Jan 24 11:51:18 PST 2006
Frank Küster wrote:
> Hi,
>
> HEADLINE: Should Poppler be used as a library for parsing and
> manipulating PDF files?
>
> I'm the maintainer of teTeX in Debian, a package that includes pdftex
> and therefore a copy of xpdf code. We've been bitten by the security
> issues in xpdf in the past, in particular because our xpdf version is
> usually older than the one the patches are made for.
>
> Therefore we always longed [1] for a shared library that we could use
> instead of our xpdf code copy, and in fact it is technically easy to use
> libpoppler[2]. However, I'm concerned whether you will support such
> uses in the future. The poppler website describes the software as "a
> PDF rendering library", but *rendering* is not what pdftex does (nor do
> pdftohtml, pdftk, and other candidates that face a similar situation).
> Instead, the functionality is used to analyse the PDF structure, to
> extract parts, and to even manipulate them (pdftk is designed for this).
>
> Therefore I'd like to know whether you plan any changes that might
> affect such uses of poppler, and whether you actually encourage such
> use.
Hi Frank,
I really like this idea, and it ties in with the original reasons for
creating the poppler library - to consolidate all the copies of the xpdf
codebase floating around into one shared library. The only reservation
I have is that we've been trying to wrap up the xpdf API that tetex (and
cups and others) use in a smaller, simpler API with glib and qt
bindings. To motivation for this is that the xpdf API is basically an
internal API that wasn't designed to be installed as public headers.
It's a huge API (80+ C++ header files) and it exposes too much of the
xpdf internals. Thus, it is not really possible to give any kind of
API/ABI guarantee for this interface, since everything except the
smallest changes are going to cause at least ABI breakage.
Having said that, if that is acceptable for tetex, I don't see why this
shouldn't work. Throughout a stable branch of poppler I wouldn't expect
the xpdf API to change much, but I can't rule out that we might have
to because of a security patch.
Another issue is that libpoppler.so currently pulls in a number of X
libraries and the cairo stack if you enable that. That shouldn't be too
hard to work around though, we could just move those dependencies to the
wrapper libraries (libpoppler-glib.so etc).
As for the discussion in http://bugs.debian.org/252104; I think
everybody would have preferred a libxpdf.so maintained by the xpdf
author, but this had been proposed to Derek by a number of different
groups prior to the poppler project started. I don't you would see so
many copies of the xpdf source in the open source landscape if that had
been an option.
cheers,
Kristian
More information about the poppler
mailing list