[HarfBuzz] Sinhala split matra
Harshula
harshula at gmail.com
Tue Nov 20 17:39:16 PST 2012
Hi Behdad,
On Fri, 2012-11-16 at 17:19 +1100, Harshula wrote:
> On Thu, 2012-11-15 at 00:12 -0800, Behdad Esfahbod wrote:
<snip>
> > we decided that we want to support both
> > categories of fonts: those that work fine with Uniscribe, and those that used
> > to work with old HarfBuzz / Pango. I don't see how that can be a limitation
> > to a user.
>
> Really? Would both those groups of fonts work at the same time in, for
> example, a Libre Office (using HarfBuzz) document?
>
> > As such, I have no interest in arguing about how that decision was
> > made, and I don't think it's relevant to any user of HarfBuzz. If you have a
> > font that is not addressed correctly with HarfBuzz as is, let us know and we
> > will try to accommodate that.
>
> Did you notice that fonts made for Uniscribe now have a subtle rendering
> error without the environment variable? Try out කෝ with a font made for
> Uniscribe. For example, try:
>
> http://www.icta.lk/attachments/1090_winnie.ttf
> http://www.icta.lk/attachments/1090_WARNA.ttf
>
> If you can accommodate both groups of fonts at the same time by default,
> that would be great.
Thanks for undoing Commit 0736915b8ed789a209205fec762997af3a8af89c
([Indic] Decompose Sinhala split matras the way old HarfBuzz / Pango
did)! I did a quick test of Harfbuzz with the following Commit:
------------------------------------------------------
commit 43b653150081a2f9dc6b7481229ac4cd952575dc
Author: Behdad Esfahbod <behdad at behdad.org>
Date: Fri Nov 16 13:12:35 2012 -0800
[Indic] Another try to unbreak Sinhala split matras
Just read the comments...
diff --git a/src/hb-ot-shape-complex-indic.cc b/src/hb-ot-shape-complex-indic.cc
index b185824..d924d1a 100644
--- a/src/hb-ot-shape-complex-indic.cc
+++ b/src/hb-ot-shape-complex-indic.cc
@@ -1317,15 +1317,42 @@ decompose_indic (const hb_ot_shape_normalize_context_t *c,
#endif
}
- if (indic_options ().uniscribe_bug_compatible)
- switch (ab)
+ if ((ab == 0x0DDA || hb_in_range<hb_codepoint_t> (ab, 0x0DDC, 0x0DDE)))
{
- /* These Sinhala ones have Unicode decompositions, but Uniscribe
- * decomposes them "Khmer-style". */
- case 0x0DDA : *a = 0x0DD9; *b= 0x0DDA; return true;
- case 0x0DDC : *a = 0x0DD9; *b= 0x0DDC; return true;
- case 0x0DDD : *a = 0x0DD9; *b= 0x0DDD; return true;
- case 0x0DDE : *a = 0x0DD9; *b= 0x0DDE; return true;
+ /*
+ * Sinhala split matras... Let the fun begin.
+ *
+ * These four characters have Unicode decompositions. However, Uniscribe
+ * decomposes them "Khmer-style", that is, it uses the character itself to
+ * get the second half. The first half of all four decompositions is always
+ * U+0DD9.
+ *
+ * Now, there are buggy fonts, namely, the widely used lklug.ttf, that are
+ * broken with Uniscribe. But we need to support them. As such, we only
+ * do the Uniscribe-style decomposition if the character is transformed into
+ * its "sec.half" form by the 'pstf' feature. Otherwise, we fall back to
+ * Unicode decomposition.
+ *
+ * Note that we can't unconditionally use Unicode decomposition. That would
+ * break some other fonts, that are designed to work with Uniscribe, and
+ * don't have positioning features for the Unicode-style decomposition.
+ *
+ * Argh...
+ */
+
+ const indic_shape_plan_t *indic_plan = (const indic_shape_plan_t *) c->plan->data;
+
+ hb_codepoint_t glyph;
+
+ if (indic_options ().uniscribe_bug_compatible ||
+ (c->font->get_glyph (ab, 0, &glyph) &&
+ indic_plan->pstf.would_substitute (&glyph, 1, true, c->font->face)))
+ {
+ /* Ok, safe to use Uniscribe-style decomposition. */
+ *a = 0x0DD9;
+ *b = ab;
+ return true;
+ }
}
return c->unicode->decompose (ab, a, b);
------------------------------------------------------
Now both groups of fonts appear to render correctly by default. That is
a fantastic outcome for Sinhala script users. BTW, the lklug.ttf font
has been deprecated/unmaintained for some time since FreeSerif's Sinhala
support improved.
Thanks again,
#
More information about the HarfBuzz
mailing list