[poppler] Comparing geometric layout information across "pages"

Alec Taylor alec.taylor6 at gmail.com
Tue Oct 11 21:08:56 PDT 2011


Thanks Josh, I was actually researching quite heavily, and found
myself on the #ghostscript channel @ freenode

They pointed me to MuPDF (one of there projects), and it seems like
the "pdfdraw" example project is something to work from, either
directly; or through parsing XML output from it.

However, if this doesn't suit your needs, please tell me why, as I
might have the same problem, and then I'll join forces! :]

On Wed, Oct 12, 2011 at 3:44 AM, Josh Richardson <jric at chegg.com> wrote:
> Thanks for the pointer, Glad.
>
> FYI, I am also interested in being able to analyze document structure.
> Our first step is to put the text back together, since in many PDFs, it is
> not logically organized in the original PDF.  pdf2html has a "coalesce"
> function which is the starting point for us.  We have made some
> improvements on it which are not yet contributed back -- so let me know if
> you want the source and/or if you want to join forces.
>
> --josh
>
> On 10/11/11 12:31 AM, "Glad Deschrijver" <glad.deschrijver at gmail.com>
> wrote:
>
>>On Tuesday 11 October 2011, Alec Taylor wrote:
>>> Good afternoon,
>>>
>>> Do you have some recommends and/or sample code for comparing textual
>>> and geometric layout information across pages?
>>>
>>> Basically I'm trying to realise patterns within documents, e.g., page
>>> numbers, header and footers, title, column information &etc; using the
>>> capabilities of the Poppler PDF library.
>>
>>Not sure that it will help you much, but you can have a look at DiffPDF
>>which
>>uses poppler to compare two PDF files page by page (both textually and
>>visually):
>>http://www.qtrac.eu/diffpdf.html
>>
>>Best regards,
>>Glad
>>
>>--
>> Everything that is really great and inspiring is created by
>> the individual who can labor in freedom.
>>      -- Albert Einstein, Out of My Later Years (1950)
>>
>>_______________________________________________
>>poppler mailing list
>>poppler at lists.freedesktop.org
>>http://lists.freedesktop.org/mailman/listinfo/poppler
>>
>
>


More information about the poppler mailing list