[poppler] Analysing 3 pages from pdftohtml -xml at a time

Alec Taylor alec.taylor6 at gmail.com
Sat Oct 22 10:04:17 PDT 2011


Good morning,

I'm trying to figure out how to analyse (in memory) 3 pages from the
pdftohtml -xml book.pdf stream, (so before it is written to the
book.xml output file).

Due to the enhancement I'm implementing onto pdftohtml, my algorithm
requires analysis of 3 pages at a time.

[p1] R [p2] R [p3]
then
[p2] R [p3] R [p4]
continue till no pages are left

(where 'R' refers to the relation I'm running on each page trio)

How do I run this relation? - Preferably using some data-structure
(i.e. intermediary in-memory XML for analysis with libxml2 libraries)

Thanks for all suggestions,

Alec Taylor


More information about the poppler mailing list