[poppler] Access to Poppler internal C++ API by GDAL
Even Rouault
even.rouault at spatialys.com
Sun Sep 10 13:41:13 UTC 2017
Hi,
I'm one of the developper of the GDAL library (http://gdal.org) that reads various raster &
vector formats, mostly geospatial, including PDF and its georeferencing extensions (either
expressed wtih Adobe Supplement to ISO 32000 or with Open Geospatial Consortium Best
Practice:
https://portal.opengeospatial.org/files/?artifact_id=40537 )
Currently we use the Poppler internal C++ API and regularly must adjust for changes in it.
Recently we had to do adjustments to accomodate for Poppler 0.58 changes. Supporting
multiple Poppler versions begin to make our code ugly. So I and packagers from Linux
distribution are wondering if there would be a way to access a more stable C++ API
Besides rendering as image, we need really low-level access to PDF objects, to be able to
parse georeferencing objects, retrieve layers, turn on/off OCG, or even access streams to
decode drawing instructions so as to build vector objects
I've tried to summarize below our current use of Poppler C++ API. I probably missed a few
calls, but you should get the overall picture:
- Object class: getType(), getTypeName(), getBool(), getInt(), getReal(), getString(),
getName(), getStream(), getArray()
- Dict class: lookupNF(), lookup(), getLength(), getKey()
- Array class: getLength(), getNF(), get()
- Stream class: getDict(), reset(), getChar(), fillGooString()
- Catalog class: getPage(), getPageRef(), readMetadata()
- GooString: getCString(), getLength()
- Ref class: access to num and gen
- PDFDoc class: isOk(), displayPageSlice(), getCatalog(), getOptContentConfig(),
getNumPages(), getDocInfo(), getErrorCode(), str private member(accessed through a ugly
"#define private public" before including poppler! we need to access it to be able to delete it
with our heap since we allocated a stream object provided to PDFDoc() constructor. this is to
avoid potential problems on Windows with cross-heap issues)
- Page class: isOk(), pageObj private member (accessed through a ugly "#define private
public" before including poppler!), getMediaBox()
- OCGs class: isOk(), getOCGs()
- GooList class: getLength(), get()
- OptionalContentGroup class: setState()
- SplashBitmap class: getBitmap(), getWidth(), getHeigh(), getDataPtr(), getAlphaPtr(),
getAlphaRowSize(), getRowSize()
- SplashOutputDev class: we subclass this class and override all/most virtual methods to be
able to turn on/off rendering of various elements as we offer options to render selectively
vector, raster and/or text elements (so basically just a conditional test to decide whether to
return as a no-op or call the base implementation)
- BaseStream class: we subclass this class to use GDAL own I/O abstraction layer (which
beyond regular files can read in .zip files, in-memory files, files available through HTTP, etc...).
So we implement copy(), makeSubStream(), getPos(), getStart(), setPos(), moveStart(),
getKind(), getFileName(), getChar(), makeSubStream(), lookChar(), reset(),
unfilteredReset(), close(), hasGetChars(), getChars()
- GlobalParams class: setPrintCommands()
- setErrorCallback() function
If you want to glance at the code, the most relevant files are:
https://github.com/OSGeo/gdal/blob/trunk/gdal/frmts/pdf/pdfobject.cpp
https://github.com/OSGeo/gdal/blob/trunk/gdal/frmts/pdf/pdfio.cpp
https://github.com/OSGeo/gdal/blob/trunk/gdal/frmts/pdf/pdfdataset.cpp
I'm not clear if that would be feasible for Poppler to provide a more stable API for our use. At
least, this makes you aware of external users of this API.
Best regards,
Even
--
Spatialys - Geospatial professional services
http://www.spatialys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20170910/151396c2/attachment.html>
More information about the poppler
mailing list