[poppler] images in pdftohtml -xml mode

Igor Slepchin igor.slepchin at gmail.com
Mon Nov 14 19:59:37 PST 2011


On 11/14/2011 07:38 PM, Albert Astals Cid wrote:
 > A Dilluns, 14 de novembre de 2011, Igor Slepchin vàreu escriure:
 >> <...>
 >> The change is small enough that I hope it won't be very controversial
 >> but comments are certainly appreciated.
 >
 > I'm a bit confused you add encoding="US-ASCII" to the first line 
pdf2xml.dtd
 > and then you remove it altogether?

Oops, thanks for noticing - removing it was a typo. I added it back now 
- xmllint doesn't like the DTD without the encoding and it does no harm 
to have it there (encoding is theoretically required in external text 
entities that have the text declaration). I also changed the encoding 
there to UTF-8 just in case it matters to anyone (all XML processors are 
required to understand UTF-8).

 > I'm wondering if why you did not add make GfxState *state a parameter 
of the
 > constructor. Seems to be mandatory to call the transform method.

Yeah, could be done that way as well - I sorta had the idea that 
(0,0)-(1,1) user space coordinates could somehow be useful on their own 
but they are clearly not at the moment.

 > I'd prefer if you make HtmlImage a class.

Sure, I'll change that - I used struct since I wanted everything there 
to be public anyway.

 > It'd be cool if next time you attach the patches instead of making me 
go and
 > lose time trying to navigate github ;-)

Here you go, with your suggested changes - sorry, I assumed you would 
prefer github :p Let me know if you want me to rebase the branch there 
so that you could pull it without intermediate commits.

Igor
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: xml_images.diff
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20111114/56b38921/attachment.ksh>


More information about the poppler mailing list