[poppler] Extract pdf

Leonard Rosenthol lrosenth at adobe.com
Thu Jan 28 05:23:59 PST 2010


Poppler doesn't expose the necessary "low level" APIs that you would need to get access to them, if present.  You'll need to get down to the original Xpdf Object classes - and to use them properly you will also need an indepth understanding of the PDF format and the relevant sections of the documention.

But yes, it is possible.

Leonard

From: amit aggarwal [mailto:amitcs06 at gmail.com]
Sent: Thursday, January 28, 2010 8:12 AM
To: Leonard Rosenthol
Cc: mpsuzuki at hiroshima-u.ac.jp; poppler at lists.freedesktop.org
Subject: Re: [poppler] Extract pdf


ahh gud ,, so is there any way we can get these optional info ?
On Thu, Jan 28, 2010 at 6:19 PM, Leonard Rosenthol <lrosenth at adobe.com<mailto:lrosenth at adobe.com>> wrote:
PDF DOES support rich semantic structure including all of things listed below (ISO 32000-1:2008, 14.7, 14.8 and 14.9). HOWEVER, it is optional and therefore many PDF documents do not contain the necessary elements.   And, as pointed out, without the presence of such elements already in the PDF - the best you can do is GUESS.

-----Original Message-----
From: poppler-bounces at lists.freedesktop.org<mailto:poppler-bounces at lists.freedesktop.org> [mailto:poppler-bounces at lists.freedesktop.org<mailto:poppler-bounces at lists.freedesktop.org>] On Behalf Of mpsuzuki at hiroshima-u.ac.jp<mailto:mpsuzuki at hiroshima-u.ac.jp>
Sent: Thursday, January 28, 2010 7:04 AM
To: amit aggarwal
Cc: poppler at lists.freedesktop.org<mailto:poppler at lists.freedesktop.org>
Subject: Re: [poppler] Extract pdf

Hi,

I think PDF is a page description language and defines
nothing for semantic structure; how to store the titles
of section, subsection, figure and tables. Therfore, I
guess, poppler cannot extract - because, PDF does not have.

Is there any reliable framework defining such and your
target documentations follow?

Regards,
mpsuzuki

On Thu, 28 Jan 2010 17:23:17 +0530
amit aggarwal <amitcs06 at gmail.com<mailto:amitcs06 at gmail.com>> wrote:

>Hi All,
>
>I want to extract the following inforamaton for pdf
>1) All Chapter Section and Subsection titles,
>2)  name of the Figures and tables
>
>Can any one plz help me for the same ?
>
>--
>Thanks
>Amit Aggarwal
>
_______________________________________________
poppler mailing list
poppler at lists.freedesktop.org<mailto:poppler at lists.freedesktop.org>
http://lists.freedesktop.org/mailman/listinfo/poppler



--
Thanks
Amit Aggarwal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/poppler/attachments/20100128/04b87dcf/attachment.html 


More information about the poppler mailing list