[HarfBuzz] Documenting OpenType shaping
Jonathan Kew
jfkthame at gmail.com
Fri Jun 15 23:48:58 UTC 2018
On 15/06/2018 15:53, Nathan Willis wrote:
> It seems like this it what is used (the same regexps being used for all
> scripts in HarfBuzz's Indic shaper):
>
> matra_group = z{0,3}.M.N?.(H | forced_rakar)?;
> [...]
> halant_or_matra_group = (final_halant_group | (H.ZWJ)? matra_group{0,4});
>
> ... and that only permits four matras (total) per syllable.
>
> I vaguely recall seeing a commit message or comment or something
> indicating that this limit was there to maintain compatibility with how
> Uniscribe matches syllables, but I searched around and couldn't find it
> today. It was something along the lines of the Microsoft docs saying
> "one matra for each type [L,R,T,B] is permitted," but that isn't clear
> whether it's justified by orthography at all or is just a practical
> concession that they made for some reason.
>
> Others with more Uniscribe knowledge may know.
Indeed, the spec at
https://docs.microsoft.com/en-us/typography/script-development/devanagari#analyze-the-text
says "matra (up to one of each type: pre-, above-, below- or post- base)"
However, I'm not sure it's a good idea to enforce this restriction.
While "normal" spelling may abide by it, in casual writing people
sometimes like to use repeated matras, just as an English speaker might
write "Helloooooooo!"
E.g. see https://www.xossip.com/showthread.php?t=1498145, where the
writer uses a number of "stretched-out" spellings (search in the page
for आाााााााााााााह, for example).
JK
More information about the HarfBuzz
mailing list