[poppler] Poppler as a library for parsing and manipulating PDF files?

Kristian Høgsberg krh at bitplanet.net
Tue Jan 24 11:51:18 PST 2006


Frank Küster wrote:
> Hi,
> 
> HEADLINE: Should Poppler be used as a library for parsing and
>           manipulating PDF files?
> 
> I'm the maintainer of teTeX in Debian, a package that includes pdftex
> and therefore a copy of xpdf code.  We've been bitten by the security
> issues in xpdf in the past, in particular because our xpdf version is
> usually older than the one the patches are made for.
> 
> Therefore we always longed [1] for a shared library that we could use
> instead of our xpdf code copy, and in fact it is technically easy to use
> libpoppler[2].  However, I'm concerned whether you will support such
> uses in the future.  The poppler website describes the software as "a
> PDF rendering library", but *rendering* is not what pdftex does (nor do
> pdftohtml, pdftk, and other candidates that face a similar situation).
> Instead, the functionality is used to analyse the PDF structure, to
> extract parts, and to even manipulate them (pdftk is designed for this).
> 
> Therefore I'd like to know whether you plan any changes that might
> affect such uses of poppler, and whether you actually encourage such
> use.

Hi Frank,

I really like this idea, and it ties in with the original reasons for 
creating the poppler library - to consolidate all the copies of the xpdf 
codebase floating around into one shared library.  The only reservation 
I have is that we've been trying to wrap up the xpdf API that tetex (and 
cups and others) use in a smaller, simpler API with glib and qt 
bindings.  To motivation for this is that the xpdf API is basically an 
internal API that wasn't designed to be installed as public headers. 
It's a huge API (80+ C++ header files) and it exposes too much of the 
xpdf internals.  Thus, it is not really possible to give any kind of 
API/ABI guarantee for this interface, since everything except the 
smallest changes are going to cause at least ABI breakage.

Having said that, if that is acceptable for tetex, I don't see why this 
shouldn't work.  Throughout a stable branch of poppler I wouldn't expect 
  the xpdf API to change much, but I can't rule out that we might have 
to because of a security patch.

Another issue is that libpoppler.so currently pulls in a number of X 
libraries and the cairo stack if you enable that.  That shouldn't be too 
hard to work around though, we could just move those dependencies to the 
wrapper libraries (libpoppler-glib.so etc).

As for the discussion in http://bugs.debian.org/252104; I think 
everybody would have preferred a libxpdf.so maintained by the xpdf 
author, but this had been proposed to Derek by a number of different 
groups prior to the poppler project started.  I don't you would see so 
many copies of the xpdf source in the open source landscape if that had 
been an option.

cheers,
Kristian


More information about the poppler mailing list