[poppler] Extract pdf

amit aggarwal amitcs06 at gmail.com
Thu Jan 28 05:11:43 PST 2010


ahh gud ,, so is there any way we can get these optional info ?

On Thu, Jan 28, 2010 at 6:19 PM, Leonard Rosenthol <lrosenth at adobe.com>wrote:

> PDF DOES support rich semantic structure including all of things listed
> below (ISO 32000-1:2008, 14.7, 14.8 and 14.9). HOWEVER, it is optional and
> therefore many PDF documents do not contain the necessary elements.   And,
> as pointed out, without the presence of such elements already in the PDF -
> the best you can do is GUESS.
>
> -----Original Message-----
> From: poppler-bounces at lists.freedesktop.org [mailto:
> poppler-bounces at lists.freedesktop.org] On Behalf Of
> mpsuzuki at hiroshima-u.ac.jp
> Sent: Thursday, January 28, 2010 7:04 AM
> To: amit aggarwal
> Cc: poppler at lists.freedesktop.org
> Subject: Re: [poppler] Extract pdf
>
> Hi,
>
> I think PDF is a page description language and defines
> nothing for semantic structure; how to store the titles
> of section, subsection, figure and tables. Therfore, I
> guess, poppler cannot extract - because, PDF does not have.
>
> Is there any reliable framework defining such and your
> target documentations follow?
>
> Regards,
> mpsuzuki
>
> On Thu, 28 Jan 2010 17:23:17 +0530
> amit aggarwal <amitcs06 at gmail.com> wrote:
>
> >Hi All,
> >
> >I want to extract the following inforamaton for pdf
> >1) All Chapter Section and Subsection titles,
> >2)  name of the Figures and tables
> >
> >Can any one plz help me for the same ?
> >
> >--
> >Thanks
> >Amit Aggarwal
> >
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>



-- 
Thanks
Amit Aggarwal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/poppler/attachments/20100128/7be117f5/attachment.html 


More information about the poppler mailing list