[poppler] Access to Poppler internal C++ API by GDAL

Even Rouault even.rouault at spatialys.com
Sun Sep 10 13:41:13 UTC 2017


Hi,

I'm one of the developper of the GDAL library (http://gdal.org) that reads various raster & 
vector formats, mostly geospatial, including PDF and its georeferencing extensions (either 
expressed wtih Adobe Supplement to ISO 32000 or with Open Geospatial Consortium Best 
Practice:
https://portal.opengeospatial.org/files/?artifact_id=40537 )

Currently we use the Poppler internal C++ API and regularly must adjust for changes in it. 
Recently we had to do adjustments to accomodate for Poppler 0.58 changes. Supporting 
multiple Poppler versions begin to make our code ugly. So I and packagers from Linux 
distribution are wondering if there would be a way to access a more stable C++ API

Besides rendering as image, we need really low-level access to PDF objects, to be able to 
parse georeferencing objects, retrieve layers, turn on/off OCG, or even access streams to 
decode drawing instructions so as to build vector objects

I've tried to summarize below our current use of Poppler C++ API. I probably missed a few 
calls, but you should get the overall picture:
- Object class: getType(), getTypeName(), getBool(), getInt(), getReal(), getString(), 
getName(), getStream(), getArray()
- Dict class: lookupNF(), lookup(), getLength(), getKey()
- Array class: getLength(), getNF(), get()
- Stream class: getDict(), reset(), getChar(), fillGooString()
- Catalog class: getPage(), getPageRef(), readMetadata()
- GooString: getCString(), getLength()
- Ref class: access to num and gen
- PDFDoc class: isOk(), displayPageSlice(), getCatalog(), getOptContentConfig(), 
getNumPages(), getDocInfo(), getErrorCode(), str private member(accessed through a ugly 
"#define private public" before including poppler! we need to access it to be able to delete it 
with our heap since we allocated a stream object provided to PDFDoc() constructor. this is to 
avoid potential problems on Windows with cross-heap issues)
- Page class: isOk(), pageObj private member (accessed through a ugly "#define private 
public" before including poppler!), getMediaBox()
- OCGs class: isOk(), getOCGs()
- GooList class: getLength(), get()
- OptionalContentGroup class: setState()
- SplashBitmap class: getBitmap(), getWidth(), getHeigh(), getDataPtr(), getAlphaPtr(), 
getAlphaRowSize(), getRowSize()
- SplashOutputDev class: we subclass this class and override all/most virtual methods to be 
able to turn on/off rendering of various elements as we offer options to render selectively 
vector, raster and/or text elements (so basically just a conditional test to decide whether to 
return as a no-op or call the base implementation)
- BaseStream class: we subclass this class to use GDAL own I/O abstraction layer (which 
beyond regular files can read in .zip files, in-memory files, files available through HTTP, etc...). 
So we implement copy(), makeSubStream(), getPos(), getStart(), setPos(), moveStart(), 
getKind(), getFileName(), getChar(), makeSubStream(), lookChar(), reset(), 
unfilteredReset(), close(), hasGetChars(), getChars()
- GlobalParams class: setPrintCommands()
- setErrorCallback() function

If you want to glance at the code, the most relevant files are:
https://github.com/OSGeo/gdal/blob/trunk/gdal/frmts/pdf/pdfobject.cpp
https://github.com/OSGeo/gdal/blob/trunk/gdal/frmts/pdf/pdfio.cpp
https://github.com/OSGeo/gdal/blob/trunk/gdal/frmts/pdf/pdfdataset.cpp

I'm not clear if that would be feasible for Poppler to provide a more stable API for our use. At 
least, this makes you aware of external users of this API.

Best regards,

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20170910/151396c2/attachment.html>


More information about the poppler mailing list