PDF processing

Thorsten Behrens thb at libreoffice.org
Thu Mar 5 12:37:23 UTC 2020


Hi,

Michael Weghorn wrote:
> On 03/03/2020 12.26, Pietro Paolini wrote:
> > I wanted to have a look at the source code
> > to see if there is some sort of PDF "model" being built from the
> > original PDF document, for instance a  set of objects each describing
> > the graphic meanings of a particular region within the page.
> > 
> 
> At a quick glance, 'sdext/source/pdfimport' looks like a good place to
> start with; I personally don't know more related to your more specific
> question.
>
Yep, that's the place - we currently use poppler to parse the PDF,
then generate a tree of quite basic drawing operations from it.

Check sdext/source/pdfimport/tree/genericelements.cxx for the type of
objects in that tree, and
sdext/source/pdfimport/tree/{draw|writer}treevisiting.cxx for a
visitor-pattern kind of tree walking - for your need, you could
e.g. check the object boundaries for each visited object, to check if
they intersect with your region of interest.

Cheers,

-- Thorsten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 1032 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20200305/883c0b32/attachment.sig>


More information about the LibreOffice mailing list