[poppler] something like an "image_list" API for cpp frontend

Mon Apr 2 08:22:51 UTC 2018

Hi,

Now I'm thinking about the possibility to add "image_list"
API, which is similar to text_list API of cpp frontend,
giving the list of the structures including the rectangle
and the pointer to the image data stream.

The easiest idea would be the incorporation of ImageOutputDev
into cpp frontend. However, there is a known issue in
ImageOutputDev; the images drawn by tiling operations are
not counted.

https://bugs.freedesktop.org/show_bug.cgi?id=91734

I should emphasize this is not so marginal case. When I
make a PDF from a HTML with many small images, via Firefox
on GNU/Linux, often the resulted PDF draw the images by
the titling operation, although the images never repeated X-o.

I'm not sure whether the fix in above bugzilla is right or
not (it seems that nobody reviews the quick fix patch), but
this fix just enables to list (with original metrics), and
extract the image data - the metrics in drawn result is not
available. So it is not the perfect solution to discuss the
"image_list" API.

there would be a rationale for the original author to
write such simple patch. The tiling operations are executed
as:

1) create new output (e.g. splash bitmap, cairo surface,
etc) to draw a single image as a pattern

2) transfer the drawn image to original output

to calculate the positions & metrics in the resulted image,
the chain of the temporal output should be kept.

The difficulty to handle the images drawn by tiling would
be:

* it is not easy to count how many times the image are
repeated.

* to obtain the position & metrics, the chain of tiling
operation should be preserved. we cannot assume the
rendering of the image for the title do not invoke yet
another tiling operation.

Thinking about the alternative, the possibility would be
parsing SVG (or XML, or CairoScript) generated by
CairoOutputDev. It seems that SVG generated by Cairo has
a flat structure (no grouped coordinate transform), all
position & metric informations could be retrieved by
the neighborhood XML elements.

However, there are 3 concerns.

--

a) nobody guarantees the forward compatibility about the
flat structure of SVG (or CairoScript, XML surface).

b) poppler has no dependency with XML parsing library,
except of the case that fontconfig depending libexpat.

c) tiling onto SVG or XML surface can cause some
rasterization.

when I convert pattern-tiling example at

https://developer.mozilla.org/en-US/docs/Web/SVG/Tutorial/Patterns

onto PDF by librsvg, it includes no raster data
(pattern.pdf.xz), but if I revert it from PDF to SVG
by pdftocairo (pattern.re.svgz), the result includes
the raster data X-o.

therefore, there is a possibility that inexisting images
are counted in this method.

--

So, what is the right way? if it is not the time to put
"image_list" into cpp frontend, is it acceptable to add
similar features to pdftimage or pdftocairo?

Regards,
mpsuzuki
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pattern.pdf.xz
Type: application/octet-stream
Size: 1344 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180402/b03a53fe/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pattern.re.svgz
Type: image/svg+xml-compressed
Size: 1707 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180402/b03a53fe/attachment.bin>