[poppler] smaller HTML images

Josh Richardson jric at chegg.com
Thu Jun 23 08:17:43 PDT 2011

We will position the images relative to the white background div.  They
will be positioned at coordinates corresponding to the regions' respective
original positions in the large single image.  They will be positioned at
a z index between the background div and the overlying text.

Let me know if that's enough info.

Thanks, --josh

On 6/23/11 12:51 AM, "Albert Astals Cid" <aacid at kde.org> wrote:

>A Thursday, June 23, 2011, Josh Richardson va escriure:
>> Currently pdftohtml is creating one large image for each HTML page
>> rendered.  In order to reduce the size of the HTML file bundles, as well
>> as to improve the semantic value of the HTML, Stephen and I would like
>> extract and use only the portions of that background image that are not
>> background white.
>> In order to accomplish this, our idea is to add hooks into the
>> SplashOutputDevNoText to catch painting operations, and record
>> of the bounding box for any painting operations.  After recording each
>> bounding box, we'll draw a new bounding box to combine any contiguous
>> regions.  Once we have a list of non-contiguous bounding boxes
>> representing all graphics operations that have occurred on the page,
>> use those bounding boxes to extract only the relevant regions from the
>> large background image, save each region as a separate file, and
>> the files from the HTML.
>> Since we're extending the output device, we'll rename it from
>> SplashOutputDevNoText to better capture the new role:
>> SplashOutputDevHtmlImages.  If you think we should retain the old
>> with a switch, please let me know ‹ I don't see a significant benefit to
>> it.
>How are you planning to make text overlap correctly the image if the
>size is changed?
>> As always, any comments appreciated.
>> --josh
>poppler mailing list
>poppler at lists.freedesktop.org

More information about the poppler mailing list