[HarfBuzz] question regarding cluster indices

Behdad Esfahbod behdad.esfahbod at gmail.com
Wed Sep 30 14:46:29 PDT 2015


Hi Phil,

On 15-09-29 07:32 PM, Phil Race wrote:
> hb_buffer_add_XXX allows you to specify a subset of the text to shape
> with the remainder being used as context but is not shaped itself and is
> not part of the output.
> 
> This is useful for various cases, for example if you are using different
> fonts for different parts of the text.
> 
> I want to make sure I understand correctly how this impacts the
> assigned output cluster for the portion of the text being shaped.
> 
> The code below shows the initial assignment of clusters based on
> index of the code point in the full text.
> So on output the cluster of the text that was shaped will start
> at the offset within the overall text.
> ie if  the full text is "ABCDEF" and we shape "DEF" then the
> output cluster indices will start with 3. i.e I can always just
> character count if I want to know what the cluster index
> would have been without such context. Is this interpretation correct ?

Correct.  Another way, if you like to have the cluster values to ignore
context, is to add a zero-length offset with the pre-context first, and then
add the actual segment.  Eg, from icu-le-hb's LayoutEngine.cpp:

    hb_buffer_add_utf16 (fHbBuffer, (const uint16_t*)chars, max, offset, 0);
    hb_buffer_add_utf16 (fHbBuffer, (const uint16_t*)(chars + offset), max -
offset, 0, count);

This is equivalent to the following code:

    hb_buffer_add_utf16 (fHbBuffer, (const uint16_t*)chars, max, offset, count);

except that the cluster values in the former code are relative to the start of
the segment whereas in the latter the cluster values are relative to the start
of the full text.

Does that make sense?

behdad


> hb_buffer_add_utf(hb_buffer_t  *buffer,
>                    const typename utf_t::codepoint_t *text,
>                    int           text_length,
>                    unsigned int  item_offset,
>                    int           item_length) {
> 
> .....
> while (next < end)
>   {
>     hb_codepoint_t u;
>     const T *old_next = next;
>     next = utf_t::next (next, end, &u, replacement);
>     buffer->add (u, old_next - (const T *) text);
>   }
> ...
> }
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz


More information about the HarfBuzz mailing list