Tagging text as being in arbitrary complex-script languages
richard.wordingham at ntlworld.com
Wed Apr 17 21:11:58 UTC 2019
On Wed, 17 Apr 2019 13:53:25 +0200
Eike Rathke <erack at redhat.com> wrote:
> > > On 4/15/19 12:26 PM, Eike Rathke wrote:
> > > > Adding arbitrary dictionary languages (as long as they strictly
> > > > follow the BCP 47 language tag specification) works since quite
> > > > a while (2014?) already.
> > An interesting experiment would be to try adding a language to both
> > Western and CTL (as with Mongolian and some minor SEA languages) or
> > Western and CJK (various Zhuang writing systems), though I suppose
> > it won't hurt to simply disambiguate by script.
> In fact you have to, or use an ISO 639-1/2/3 language code that
> implies a default script for one and specify an ISO 15924 script code
> for the other, which I was referring with "correct BCP 47 language
Is there a pointer as to which tag sequences that "strictly follow the
BCP 47 language tag specification" are "correct"?
As far as I can tell, the following all strictly follow the
"sa" Sanskrit, with no specification of the script or spelling
"sa-IN" Sanskrit as used in India - so far as I can tell, that could be
in, for example, Devanagari, Grantha or even the Tamil script! For
Devanagari at least, I understand that this implies that homorganic
nasals may be written using U+0902 DEVANAGARI SIGN ANUSVARA.
"sa-150" Sanskrit written using European conventions - so, any script,
but, at least for Devanagari, the anusvara sign is not used for
"sa-Deva-150" Sanskrit written in Devanagari in the manner used in
"sa-Latn" Sanskrit written in the Roman script.
"sa-Latf" Sanskrit written in Fraktur (I'm not sure that this exists.
It might need a hint as to where to find a Fraktur script with a
The only Sanskrit tag sequence I can find in isolang.cxx is "sa-IN".
More information about the LibreOffice