[poppler] Output images from pdftohtml in xml mode

Albert Astals Cid aacid at kde.org
Sat Apr 17 03:17:45 PDT 2010


A Dijous, 15 d'abril de 2010, Mike Tonks va escriure:
> Hi,

Hi

> 
> I'm new here

welcome

> and I'm considering a patch to pdftohtml (well
> HtmlOutputDev).  I'm coming from a perl background so I may not get it
> right first time but I'll do my best!  It will be my first patch so
> any help appreciated.
> 
> Changes:
> 
> 1) Include the images in xml mode unless -ignore is specified.
> 
> 2) Include the top, left, width, height data in img tags, where
> appropriate depending on mode.  Not applicable to complex mode, in
> html mode height and width probably useful, positioning would be great
> but can be expanded later if required e.g. left, right or position
> relative to text.  In xml mode just output all available data.
> 
> Use Case: I'm post processing the xml and I do need the image data to
> be output.  It's part of a workflow to produce epub ebook format from
> pdf.

Seems sane.

> 
> I've had a look at the code and it seems fairly straight forward, as
> the images are already output in other modes.  Currently only the
> image src attribute is passed through so I guess there needs to be a
> new HtmlImage class (plus HtmlImages / HtmlImageAccu to handle the
> iteration).  It looks like I can base this on the HtmlFont & HtmlLink
> modules, so I'll just follow the existing patterns there.
> 
> 
> Would you be likely to accept this patch once I get it working?  

If the code is ok, yes, we'll probably accept it.

Thanks and welcome again :-)

Albert

> Any suggestions?
> 
> cheers,
> 
> mike
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list