Tagging text as being in arbitrary complex-script languages

Richard Wordingham richard.wordingham at ntlworld.com
Thu Apr 18 19:40:01 UTC 2019

On Thu, 18 Apr 2019 12:25:11 +0200
Eike Rathke <erack at redhat.com> wrote:

> What I usually did is, lookup the language at SIL and the Ethnologue
> and use the most prevalent script as implied default script. Which
> here https://www.ethnologue.com/language/san would lead to
> Devanagari, but in this case more important is also what MS assigned
> the LCID for.

So I shouldn't be misled by the fact that the CTL script I most
frequently write Sanskrit in is Thai -:)  Seriously, though, I believe
the script of sa-TH is Thai is rather than Devanagari, and I am quite
sure that the script of sa-MM is Mymr.

It sounds as though one has to specify the script where there is doubt
as to what type of script will dominate. Is it an issue if there are
two competing scripts of the same type, e.g Thai v. Lanna for Northern
Thai?  A dual script dictionary would correct inefficiently.

> > "sa-150" Sanskrit written using European conventions - so, any
> > script, but, at least for Devanagari, the anusvara sign is not used
> > for homorganic nasals.  
> Though valid, LibreOffice doesn't use the numeric UN M.49 code, it may
> be accepted but might not work everywhere.
> > "sa-Deva-150" Sanskrit written in Devanagari in the manner used in
> > Europe.  
> Same here.
> > "sa-Latn" Sanskrit written in the Roman script.
> > 
> > "sa-Latf" Sanskrit written in Fraktur (I'm not sure that this
> > exists. It might need a hint as to where to find a Fraktur script
> > with a combining candrabindu.)  
> Both perfectly valid, if they serve any purpose. Though with sa-Latn
> I doubt there's a use case, so I wouldn't call that "correct" in
> common sense.

So how do you suggest we tag Sanskrit in Latin script?  Within English
works, its not uncommon for any Sankrit quoted precisely to be in the
Latin script; about half the English language articles in the
'International Journal of Sanskrit
Research' (http://www.anantaajournal.com/) that quote Sanskrit passages
quote them in the Latin script.  Several papers would benefit from the
application of sa-Latn proofing tools, though I don't denying that
proofing Sanskrit may be difficult.

Moreover, I've only ever seen U+0310 COMBINING CANDRABINDU in examples
of Sanskrit in Latin text. 

> I also just learned that sa-Latf somehow exists..

That example is in the same spirit as en-Thai (which I've successfully
used for privacy) and notes I've seen kept in en-Runr on a publicly
accessible whiteboard.
I was wondering whether Sanskrit was printed in Antiqua or Fraktur in
early 20th Century Germany.  You seem to think neither.


More information about the LibreOffice mailing list