[poppler] Comparing geometric layout information across "pages"

Josh Richardson jric at chegg.com
Tue Oct 11 09:44:41 PDT 2011


Thanks for the pointer, Glad.

FYI, I am also interested in being able to analyze document structure.
Our first step is to put the text back together, since in many PDFs, it is
not logically organized in the original PDF.  pdf2html has a "coalesce"
function which is the starting point for us.  We have made some
improvements on it which are not yet contributed back -- so let me know if
you want the source and/or if you want to join forces.

--josh

On 10/11/11 12:31 AM, "Glad Deschrijver" <glad.deschrijver at gmail.com>
wrote:

>On Tuesday 11 October 2011, Alec Taylor wrote:
>> Good afternoon,
>> 
>> Do you have some recommends and/or sample code for comparing textual
>> and geometric layout information across pages?
>> 
>> Basically I'm trying to realise patterns within documents, e.g., page
>> numbers, header and footers, title, column information &etc; using the
>> capabilities of the Poppler PDF library.
>
>Not sure that it will help you much, but you can have a look at DiffPDF
>which 
>uses poppler to compare two PDF files page by page (both textually and
>visually):
>http://www.qtrac.eu/diffpdf.html
>
>Best regards,
>Glad
>
>-- 
> Everything that is really great and inspiring is created by
> the individual who can labor in freedom.
>      -- Albert Einstein, Out of My Later Years (1950)
>
>_______________________________________________
>poppler mailing list
>poppler at lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler
>



More information about the poppler mailing list