[HarfBuzz] Documenting OpenType shaping

Jonathan Kew jfkthame at gmail.com
Fri Jun 15 23:48:58 UTC 2018


On 15/06/2018 15:53, Nathan Willis wrote:

> It seems like this it what is used (the same regexps being used for all 
> scripts in HarfBuzz's Indic shaper):
> 
> matra_group = z{0,3}.M.N?.(H | forced_rakar)?;
> [...]
> halant_or_matra_group = (final_halant_group | (H.ZWJ)? matra_group{0,4});
> 
> ... and that only permits four matras (total) per syllable.
> 
> I vaguely recall seeing a commit message or comment or something 
> indicating that this limit was there to maintain compatibility with how 
> Uniscribe matches syllables, but I searched around and couldn't find it 
> today. It was something along the lines of the Microsoft docs saying 
> "one matra for each type [L,R,T,B] is permitted," but that isn't clear 
> whether it's justified by orthography at all or is just a practical 
> concession that they made for some reason.
> 
> Others with more Uniscribe knowledge may know.

Indeed, the spec at 
https://docs.microsoft.com/en-us/typography/script-development/devanagari#analyze-the-text 
says "matra (up to one of each type: pre-, above-, below- or post- base)"

However, I'm not sure it's a good idea to enforce this restriction. 
While "normal" spelling may abide by it, in casual writing people 
sometimes like to use repeated matras, just as an English speaker might 
write "Helloooooooo!"

E.g. see https://www.xossip.com/showthread.php?t=1498145, where the 
writer uses a number of "stretched-out" spellings (search in the page 
for आाााााााााााााह, for example).

JK


More information about the HarfBuzz mailing list