[HarfBuzz] Different results when shaping sub-sections of text

Nikolay Sivov bunglehead at gmail.com
Wed Oct 7 06:50:25 PDT 2015


On 07.10.2015 14:41, Jamie Dale wrote:
> Thanks for the info. I actually started working with the extracted
> cluster information right before you sent this message, although that
> was to minimise the shaping requests I need to make by extracting the
> info (where possible) from the shaped data for the entire run of text.
>
> I'll admit that colour only was a bad example, but aside from also being
> able to change the font or font size, our rich-text can also contain
> completely user-defined widgets. This can make extracting out the style
> information... tricky, since I don't really know how it's being used
> (and may actually be part of a nested control, such as a button or
> hyperlink).

This should also include locale/language, as it affects feature 
selection process. On Windows it should also include font style 
separately, because of such thing as bold/oblique simulations.

>
> Rich-text itself is actually a secondary concern right now, my primary
> concern is selection highlighting (which uses a similar mechanism, as
> text is broken into runs where it is selected, since selection can
> change the text colour). That said, selection isn't allowed to change
> the font used so I can more easily combine the selected and non-selected
> text into a single shape, however I'm still unsure how ligatures would
> be handled in that case.
>
> I'll use English for simplicity since I can actually read it. Imagine I
> have the text "Magnificent", where the "fi" has been combined into a
> ligature. If I were to select "Magnif", then in order to change the
> colour of that portion of the text, the ligature would have to be split.
> This doesn't present a readability issue for English, but would it
> present issues for other languages?

If you want to be able to select 'f' alone in 'fi' ligature and color it 
separately you'll definitely have to break a ligature, because it's a 
single glyph, and you can't know how to draw half of it in different 
color because there's no way to know what is a half. On complex scripts 
I can think of two different cases - if it's same case when single glyph 
represents several codepoints it's the same thing as with 'fi', you have 
to break it if you want to render parts separately. If it's a case when 
a glyph was substituted with another glyph form of different shape 
basing on context you can color it separately, without breaking its 
shape. In general foolproof solution I think is to treat clusters as a 
whole and render whole cluster with same rendering style (like fill 
pattern, color or whatever).

I just tried that in LibreOffice Writer, and it seems like changing 
color in Arabic string disables some advance adjustments, but overall 
shape is intact. That's especially visible if you apply strikeout style 
to whole text - this results in gaps in strikeout line. For English it 
does indeed break ligatures if you try to color 'f' separately from 'i'.

>
> -Jamie.
>
> On 6 October 2015 at 22:45, Khaled Hosny <khaledhosny at eglug.org
> <mailto:khaledhosny at eglug.org>> wrote:
>
>     On Tue, Oct 06, 2015 at 08:08:00PM +0100, Jamie Dale wrote:
>     > I suspect that the first shape has used some ligatures, and the second
>     > shape was unable to do that due to being unable to combine the glyphs (I
>     > have previously seen this with the "fi" ligature in English).
>     >
>     > If both of these forms are considered acceptable, then I'm happy enough,
>
>     Shaping parts of text separately is generally a bad idea as you lose any
>     OpenType interaction between these parts, so you only do it when it is
>     absolutely necessary (e.g. due to font change). Though your second image
>     is still barely legible, it loses all the contextual substitutions
>     specified in the font and gives a very suboptimal result, but it can
>     make the text illegible in many other cases, for example when shaping
>     "لا". I expect Indic scripts to suffer more legibility-wise.
>
>     The proper way it to identify rich-text attributes that shouldn’t break
>     shaping (color, underline, overline, etc.) and apply them after shaping,
>     using cluster values to do the reverse glyph to character index mapping
>     (while at it, use HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS so that
>     you get more finer cluster mapping).
>
>     Regards,
>     Khaled
>
>
>
>
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz
>



More information about the HarfBuzz mailing list