[cairo] PDF Text Extraction: Future

Robert O'Callahan robert at ocallahan.org
Sun Oct 14 18:24:01 PDT 2007


On Sep 17, 2007 12:37 PM, Behdad Esfahbod <behdad at behdad.org> wrote:

> cairo_public void
> cairo_show_text_glyphs (cairo_t                    *cr,
>                        const char                 *utf8,
>                        int                         utf8_len,
>                        const cairo_glyph_t        *glyphs,
>                        int                         num_glyphs,
>                        const cairo_text_cluster_t *clusters,
>                        int                         num_clusters,
>                        cairo_bool_t                backward);


It would be useful to have an API to detect whether a surface can make use
of this extra information, because there's a cost to building 'utf8' and
'clusters', and this is performance critical code so we'd want to avoid that
cost when the information will not be used (which will be the vasty majority
of the time...).

   There is nothing preventing a library generating
>    glyphs that have a negative advance width and so go in the
>    logical order for right-to-left text, but it's not common
>    practice and most probably not very well supported.


If I understand you correctly, Gecko does this. For RTL runs we're calling
cairo_show_glyphs with a glyph array whose x-offsets decrease along the
array. I think this is technically necessary for CSS compliance since CSS
says that all other things being equal, content later in a document (i.e. in
logical order) is higher in z-order than content earlier in the document.

Rob
-- 
"Two men owed money to a certain moneylender. One owed him five hundred
denarii, and the other fifty. Neither of them had the money to pay him back,
so he canceled the debts of both. Now which of them will love him more?"
Simon replied, "I suppose the one who had the bigger debt canceled." "You
have judged correctly," Jesus said. [Luke 7:41-43]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.cairographics.org/archives/cairo/attachments/20071015/1a38684c/attachment.html 


More information about the cairo mailing list