[poppler] smaller HTML images
Albert Astals Cid
aacid at kde.org
Thu Jun 23 00:51:39 PDT 2011
A Thursday, June 23, 2011, Josh Richardson va escriure:
> Currently pdftohtml is creating one large image for each HTML page
> rendered. In order to reduce the size of the HTML file bundles, as well
> as to improve the semantic value of the HTML, Stephen and I would like to
> extract and use only the portions of that background image that are not
> background white.
> In order to accomplish this, our idea is to add hooks into the
> SplashOutputDevNoText to catch painting operations, and record coordinates
> of the bounding box for any painting operations. After recording each
> bounding box, we'll draw a new bounding box to combine any contiguous
> regions. Once we have a list of non-contiguous bounding boxes
> representing all graphics operations that have occurred on the page, we'll
> use those bounding boxes to extract only the relevant regions from the
> large background image, save each region as a separate file, and reference
> the files from the HTML.
> Since we're extending the output device, we'll rename it from
> SplashOutputDevNoText to better capture the new role:
> SplashOutputDevHtmlImages. If you think we should retain the old behavior
> with a switch, please let me know — I don't see a significant benefit to
How are you planning to make text overlap correctly the image if the image
size is changed?
> As always, any comments appreciated.
More information about the poppler