[poppler] smaller HTML images

Thu Jun 23 00:51:39 PDT 2011

A Thursday, June 23, 2011, Josh Richardson va escriure:
> Currently pdftohtml is creating one large image for each HTML page
> rendered.  In order to reduce the size of the HTML file bundles, as well
> as to improve the semantic value of the HTML, Stephen and I would like to
> extract and use only the portions of that background image that are not
> background white.
> 
> In order to accomplish this, our idea is to add hooks into the
> SplashOutputDevNoText to catch painting operations, and record coordinates
> of the bounding box for any painting operations.  After recording each
> bounding box, we'll draw a new bounding box to combine any contiguous
> regions.  Once we have a list of non-contiguous bounding boxes
> representing all graphics operations that have occurred on the page, we'll
> use those bounding boxes to extract only the relevant regions from the
> large background image, save each region as a separate file, and reference
> the files from the HTML.
> 
> Since we're extending the output device, we'll rename it from
> SplashOutputDevNoText to better capture the new role: 
> SplashOutputDevHtmlImages.  If you think we should retain the old behavior
> with a switch, please let me know — I don't see a significant benefit to
> it.

How are you planning to make text overlap correctly the image if the image 
size is changed?

Albert

> 
> As always, any comments appreciated.
> 
> --josh