[HarfBuzz] Different results when shaping sub-sections of text

Sat Oct 10 09:39:30 PDT 2015

On Wed, Oct 07, 2015 at 12:41:56PM +0100, Jamie Dale wrote:
> I'll admit that colour only was a bad example, but aside from also being
> able to change the font or font size, our rich-text can also contain
> completely user-defined widgets. This can make extracting out the style
> information... tricky, since I don't really know how it's being used (and
> may actually be part of a nested control, such as a button or hyperlink).

That is your call, but I’d go for a solution that at least covers known
formating properties. As I said, such shape splitting is bad and should
be avoided whenever possible.

> Rich-text itself is actually a secondary concern right now, my primary
> concern is selection highlighting (which uses a similar mechanism, as text
> is broken into runs where it is selected, since selection can change the
> text colour). That said, selection isn't allowed to change the font used so
> I can more easily combine the selected and non-selected text into a single
> shape, however I'm still unsure how ligatures would be handled in that case.
> 
> I'll use English for simplicity since I can actually read it. Imagine I
> have the text "Magnificent", where the "fi" has been combined into a
> ligature. If I were to select "Magnif", then in order to change the colour
> of that portion of the text, the ligature would have to be split. This
> doesn't present a readability issue for English, but would it present
> issues for other languages?

You would be getting completely different glyphs for selected and
unselected text, which strikes me as a rather bad user experience. I
have never used an application that does anything like this. What I have
seen is that applications that naive applications either color the whole
ligature or not at all, while more sophisticated applications use
clipping to just color the part of the glyph they think belongs to the
highlighted characters (and determining this can either by just evenly
distributing the ligature advance width over its components or using
hb_ot_layout_get_ligature_carets(), with the former method as a
fallback). Also note that splitting the text is not only about the
ligatures, in the Amiri case you showed no ligatures were involved at
all so you should have no problem coloring the highlighted part without
playing any tricks, and there are Latin fonts that also handle
f-ligatures by using contextual forms and no actual ligatures.

Regards,
Khaled

> 
> -Jamie.
> 
> On 6 October 2015 at 22:45, Khaled Hosny <khaledhosny at eglug.org> wrote:
> 
> > On Tue, Oct 06, 2015 at 08:08:00PM +0100, Jamie Dale wrote:
> > > I suspect that the first shape has used some ligatures, and the second
> > > shape was unable to do that due to being unable to combine the glyphs (I
> > > have previously seen this with the "fi" ligature in English).
> > >
> > > If both of these forms are considered acceptable, then I'm happy enough,
> >
> > Shaping parts of text separately is generally a bad idea as you lose any
> > OpenType interaction between these parts, so you only do it when it is
> > absolutely necessary (e.g. due to font change). Though your second image
> > is still barely legible, it loses all the contextual substitutions
> > specified in the font and gives a very suboptimal result, but it can
> > make the text illegible in many other cases, for example when shaping
> > "لا". I expect Indic scripts to suffer more legibility-wise.
> >
> > The proper way it to identify rich-text attributes that shouldn’t break
> > shaping (color, underline, overline, etc.) and apply them after shaping,
> > using cluster values to do the reverse glyph to character index mapping
> > (while at it, use HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS so that
> > you get more finer cluster mapping).
> >
> > Regards,
> > Khaled
> >