[poppler] images in pdftohtml -xml mode
Albert Astals Cid
aacid at kde.org
Mon Nov 14 16:38:12 PST 2011
A Dilluns, 14 de novembre de 2011, Igor Slepchin vàreu escriure:
> I know that dumping images when running pdftohtml with -xml flag has
> been brought up before and it seems that the devs said they would accept
> a patch; however, it looks like nothing has made it into the source tree
> so far. I figured I could give this a try too so please take a look at
> my proposed changes if there is still some interest in this
> functionality: https://github.com/igors/poppler/tree/xml_images
>
> The first commit in the above branch fixes up pdf2xml.dtd to match what
> pdftohtml generates; the second patch adds support for images in -xml
> mode. With this patch applied, pdftohtml -xml will dump all image files
> just like it does in html mode and will add image elements at the
> beginning of each page that has images, i.e., you'll see something like
> the following in the generated xml:
>
> <page number="51" position="absolute" top="0" left="0"
> height="896" width="572">
> <image top="45" left="26" width="523" height="373" src="filename.jpg"/>
> <text top="534" left="81" width="17" height="15" font="18">In </text>
>
> The default behavior with -xml switch is to process images now; adding
> -i option restores the old behavior.
>
> The change is small enough that I hope it won't be very controversial
> but comments are certainly appreciated.
I'm a bit confused you add encoding="US-ASCII" to the first line pdf2xml.dtd
and then you remove it altogether?
I'm wondering if why you did not add make GfxState *state a parameter of the
constructor. Seems to be mandatory to call the transform method.
I'd prefer if you make HtmlImage a class.
It'd be cool if next time you attach the patches instead of making me go and
lose time trying to navigate github ;-)
Albert
>
> Thanks,
> Igor
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
More information about the poppler
mailing list