Tagging text as being in arbitrary complex-script languages

Richard Wordingham richard.wordingham at ntlworld.com
Wed Apr 17 21:11:58 UTC 2019

On Wed, 17 Apr 2019 13:53:25 +0200
Eike Rathke <erack at redhat.com> wrote:

> > > On 4/15/19 12:26 PM, Eike Rathke wrote:  
> > > > Adding arbitrary dictionary languages (as long as they strictly
> > > > follow the BCP 47 language tag specification) works since quite
> > > > a while (2014?) already.  

> > An interesting experiment would be to try adding a language to both
> > Western and CTL (as with Mongolian and some minor SEA languages) or
> > Western and CJK (various Zhuang writing systems), though I suppose
> > it won't hurt to simply disambiguate by script.  
> In fact you have to, or use an ISO 639-1/2/3 language code that
> implies a default script for one and specify an ISO 15924 script code
> for the other, which I was referring with "correct BCP 47 language
> tags".

Is there a pointer as to which tag sequences that "strictly follow the
BCP 47 language tag specification" are "correct"?

As far as I can tell, the following all strictly follow the

"sa" Sanskrit, with no specification of the script or spelling

"sa-IN" Sanskrit as used in India - so far as I can tell, that could be
in, for example, Devanagari, Grantha or even the Tamil script!  For
Devanagari at least, I understand that this implies that homorganic
nasals may be written using U+0902 DEVANAGARI SIGN ANUSVARA.

"sa-150" Sanskrit written using European conventions - so, any script,
but, at least for Devanagari, the anusvara sign is not used for
homorganic nasals.

"sa-Deva-150" Sanskrit written in Devanagari in the manner used in

"sa-Latn" Sanskrit written in the Roman script.

"sa-Latf" Sanskrit written in Fraktur (I'm not sure that this exists.
It might need a hint as to where to find a Fraktur script with a
combining candrabindu.)

The only Sanskrit tag sequence I can find in isolang.cxx is "sa-IN".


More information about the LibreOffice mailing list