[poppler] Hi All, I have a question about libpoppler and need your helps, thanks in advance.

Tue Feb 21 01:35:43 PST 2012

But as  I know,in source code of poppler[TextOutputDev.cc],there are some
concept like TextPage,TextFlow etc. implemented as class.And if a page has
its subtitle and footer(something like copyrights) besides the content,they
will be mapped to different TextFlow objects.I just wonder how poppler know
they are different flow...

On Tue, Feb 21, 2012 at 5:09 PM, Brad Hards <bradh at frogmouth.net> wrote:

> On Tuesday 21 February 2012 12:20:36 Zhenbang Xi wrote:
> > *I am developing  a program using libpoppler to convert PDF to plain
> text.*
> > *And I want to distinguish the page header and page footer from a page,in
> > other words,I want to output them  separately(including the main
> content).*
> > *How can I do this? Is there any structure or class that hold them in
> > memory?*
> There is no way to identify this reliably - PDF (and hence poppler) doesn't
> have any feature to interpret the intent of certain characters. It might be
> possible to come up with a good heuristic for some documents, based on page
> location.
>
> Brad
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20120221/01a94185/attachment.htm>