[poppler] Output images from pdftohtml in xml mode
Albert Astals Cid
aacid at kde.org
Sat Apr 17 03:17:45 PDT 2010
A Dijous, 15 d'abril de 2010, Mike Tonks va escriure:
> Hi,
Hi
>
> I'm new here
welcome
> and I'm considering a patch to pdftohtml (well
> HtmlOutputDev). I'm coming from a perl background so I may not get it
> right first time but I'll do my best! It will be my first patch so
> any help appreciated.
>
> Changes:
>
> 1) Include the images in xml mode unless -ignore is specified.
>
> 2) Include the top, left, width, height data in img tags, where
> appropriate depending on mode. Not applicable to complex mode, in
> html mode height and width probably useful, positioning would be great
> but can be expanded later if required e.g. left, right or position
> relative to text. In xml mode just output all available data.
>
> Use Case: I'm post processing the xml and I do need the image data to
> be output. It's part of a workflow to produce epub ebook format from
> pdf.
Seems sane.
>
> I've had a look at the code and it seems fairly straight forward, as
> the images are already output in other modes. Currently only the
> image src attribute is passed through so I guess there needs to be a
> new HtmlImage class (plus HtmlImages / HtmlImageAccu to handle the
> iteration). It looks like I can base this on the HtmlFont & HtmlLink
> modules, so I'll just follow the existing patterns there.
>
>
> Would you be likely to accept this patch once I get it working?
If the code is ok, yes, we'll probably accept it.
Thanks and welcome again :-)
Albert
> Any suggestions?
>
> cheers,
>
> mike
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
More information about the poppler
mailing list