[poppler] Analysing 3 pages from pdftohtml -xml at a time
Alec Taylor
alec.taylor6 at gmail.com
Sat Oct 22 10:04:17 PDT 2011
Good morning,
I'm trying to figure out how to analyse (in memory) 3 pages from the
pdftohtml -xml book.pdf stream, (so before it is written to the
book.xml output file).
Due to the enhancement I'm implementing onto pdftohtml, my algorithm
requires analysis of 3 pages at a time.
[p1] R [p2] R [p3]
then
[p2] R [p3] R [p4]
continue till no pages are left
(where 'R' refers to the relation I'm running on each page trio)
How do I run this relation? - Preferably using some data-structure
(i.e. intermediary in-memory XML for analysis with libxml2 libraries)
Thanks for all suggestions,
Alec Taylor
More information about the poppler
mailing list