[poppler] images in pdftohtml -xml mode

Albert Astals Cid aacid at kde.org
Mon Nov 14 16:38:12 PST 2011


A Dilluns, 14 de novembre de 2011, Igor Slepchin vàreu escriure:
> I know that dumping images when running pdftohtml with -xml flag has
> been brought up before and it seems that the devs said they would accept
> a patch; however, it looks like nothing has made it into the source tree
> so far. I figured I could give this a try too so please take a look at
> my proposed changes if there is still some interest in this
> functionality: https://github.com/igors/poppler/tree/xml_images
> 
> The first commit in the above branch fixes up pdf2xml.dtd to match what
> pdftohtml generates; the second patch adds support for images in -xml
> mode. With this patch applied, pdftohtml -xml will dump all image files
> just like it does in html mode and will add image elements at the
> beginning of each page that has images, i.e., you'll see something like
> the following in the generated xml:
> 
> <page number="51" position="absolute" top="0" left="0"
>        height="896" width="572">
> <image top="45" left="26" width="523" height="373" src="filename.jpg"/>
> <text top="534" left="81" width="17" height="15" font="18">In </text>
> 
> The default behavior with -xml switch is to process images now; adding
> -i option restores the old behavior.
> 
> The change is small enough that I hope it won't be very controversial
> but comments are certainly appreciated.

I'm a bit confused you add encoding="US-ASCII" to the first line pdf2xml.dtd 
and then you remove it altogether?

I'm wondering if why you did not add make GfxState *state a parameter of the 
constructor. Seems to be mandatory to call the transform method.

I'd prefer if you make HtmlImage a class.

It'd be cool if next time you attach the patches instead of making me go and 
lose time trying to navigate github ;-)

Albert

> 
> Thanks,
> Igor
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list