[poppler] something like an "image_list" API for cpp frontend
Albert Astals Cid
aacid at kde.org
Fri Oct 5 21:01:40 UTC 2018
El dilluns, 2 d’abril de 2018, a les 10:22:51 CEST, suzuki toshiya va escriure:
> Hi,
Hi 6 months later :/
>
> Now I'm thinking about the possibility to add "image_list"
> API, which is similar to text_list API of cpp frontend,
> giving the list of the structures including the rectangle
> and the pointer to the image data stream.
>
> The easiest idea would be the incorporation of ImageOutputDev
> into cpp frontend. However, there is a known issue in
> ImageOutputDev; the images drawn by tiling operations are
> not counted.
>
> https://bugs.freedesktop.org/show_bug.cgi?id=91734
>
> I should emphasize this is not so marginal case. When I
> make a PDF from a HTML with many small images, via Firefox
> on GNU/Linux, often the resulted PDF draw the images by
> the titling operation, although the images never repeated X-o.
You mean the images are in the pdf as a tile repeat of 1?
> I'm not sure whether the fix in above bugzilla is right or
> not (it seems that nobody reviews the quick fix patch), but
> this fix just enables to list (with original metrics), and
> extract the image data - the metrics in drawn result is not
> available. So it is not the perfect solution to discuss the
> "image_list" API.
>
> there would be a rationale for the original author to
> write such simple patch. The tiling operations are executed
> as:
>
> 1) create new output (e.g. splash bitmap, cairo surface,
> etc) to draw a single image as a pattern
>
> 2) transfer the drawn image to original output
>
> to calculate the positions & metrics in the resulted image,
> the chain of the temporal output should be kept.
>
> The difficulty to handle the images drawn by tiling would
> be:
>
> * it is not easy to count how many times the image are
> repeated.
>
> * to obtain the position & metrics, the chain of tiling
> operation should be preserved. we cannot assume the
> rendering of the image for the title do not invoke yet
> another tiling operation.
>
> Thinking about the alternative, the possibility would be
> parsing SVG (or XML, or CairoScript) generated by
> CairoOutputDev. It seems that SVG generated by Cairo has
> a flat structure (no grouped coordinate transform), all
> position & metric informations could be retrieved by
> the neighborhood XML elements.
>
> However, there are 3 concerns.
>
> --
>
> a) nobody guarantees the forward compatibility about the
> flat structure of SVG (or CairoScript, XML surface).
>
> b) poppler has no dependency with XML parsing library,
> except of the case that fontconfig depending libexpat.
>
> c) tiling onto SVG or XML surface can cause some
> rasterization.
>
> when I convert pattern-tiling example at
>
> https://developer.mozilla.org/en-US/docs/Web/SVG/Tutorial/Patterns
>
> onto PDF by librsvg, it includes no raster data
> (pattern.pdf.xz), but if I revert it from PDF to SVG
> by pdftocairo (pattern.re.svgz), the result includes
> the raster data X-o.
>
> therefore, there is a possibility that inexisting images
> are counted in this method.
>
> --
>
> So, what is the right way?
I'd say keep ignoring tiles for the time being, and if you find lots of cases where a tile is "wrongly" used, ask the people that generate it to "fix" the pdf, since obviously it's not what they wanted.
> if it is not the time to put "image_list" into cpp frontend
It is ok, actually i know someone else that wanted to do that.
> , is it acceptable to add similar features to pdftimage or pdftocairo?
pdftoppm and pdftocairo have a different purpose, they just render a given page, what would you do with tiled images for them?
Cheers,
Albert
>
> Regards,
> mpsuzuki
>
More information about the poppler
mailing list