But as I know,in source code of poppler[TextOutputDev.cc],there are some concept like TextPage,TextFlow etc. implemented as class.And if a page has its subtitle and footer(something like copyrights) besides the content,they will be mapped to different TextFlow objects.I just wonder how poppler know they are different flow...<div> <div> <br><br><div class="gmail_quote">On Tue, Feb 21, 2012 at 5:09 PM, Brad Hards <span dir="ltr"><<a href="mailto:bradh@frogmouth.net">bradh@frogmouth.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On Tuesday 21 February 2012 12:20:36 Zhenbang Xi wrote:<br> > *I am developing a program using libpoppler to convert PDF to plain text.*<br> > *And I want to distinguish the page header and page footer from a page,in<br> > other words,I want to output them separately(including the main content).*<br> > *How can I do this? Is there any structure or class that hold them in<br> > memory?*<br> There is no way to identify this reliably - PDF (and hence poppler) doesn't<br> have any feature to interpret the intent of certain characters. It might be<br> possible to come up with a good heuristic for some documents, based on page<br> location.<br> <br> Brad<br> _______________________________________________<br> poppler mailing list<br> <a href="mailto:poppler@lists.freedesktop.org">poppler@lists.freedesktop.org</a><br> <a href="http://lists.freedesktop.org/mailman/listinfo/poppler" target="_blank">http://lists.freedesktop.org/mailman/listinfo/poppler</a><br> </blockquote></div><br></div></div>