Tagging text as being in arbitrary complex-script languages

Eike Rathke erack at redhat.com
Tue Apr 23 15:35:10 UTC 2019


Hi Richard,

On Thursday, 2019-04-18 20:40:01 +0100, Richard Wordingham wrote:

> On Thu, 18 Apr 2019 12:25:11 +0200
> Eike Rathke <erack at redhat.com> wrote:
> 
> > What I usually did is, lookup the language at SIL and the Ethnologue
> > and use the most prevalent script as implied default script. Which
> > here https://www.ethnologue.com/language/san would lead to
> > Devanagari, but in this case more important is also what MS assigned
> > the LCID for.
> 
> So I shouldn't be misled by the fact that the CTL script I most
> frequently write Sanskrit in is Thai -:)  Seriously, though, I believe
> the script of sa-TH is Thai is rather than Devanagari, and I am quite
> sure that the script of sa-MM is Mymr.

Your expertise is welcome!
If the IANA language tag registry doesn't indicate a Suppress-Script
field for a specific language then nowadays it is indeed better practice
to explicitly state the script tag for languages that otherwise could be
ambiguous. So that would be sa-Thai-TH and sa-Mymr-MM. Deducing the
script from the language-country combo is deprecated, but for backwards
and MS compatibility not avoidable for existing tags.


> It sounds as though one has to specify the script where there is doubt
> as to what type of script will dominate. Is it an issue if there are
> two competing scripts of the same type, e.g Thai v. Lanna for Northern
> Thai?  A dual script dictionary would correct inefficiently.

Competing in the sense two different scripts under one language tag?
I wouldn't do that and IMHO it would be wrong.


> > Though with sa-Latn
> > I doubt there's a use case, so I wouldn't call that "correct" in
> > common sense.
> 
> So how do you suggest we tag Sanskrit in Latin script?  Within English
> works, its not uncommon for any Sankrit quoted precisely to be in the
> Latin script; about half the English language articles in the
> 'International Journal of Sanskrit
> Research' (http://www.anantaajournal.com/) that quote Sanskrit passages
> quote them in the Latin script.  Several papers would benefit from the
> application of sa-Latn proofing tools, though I don't denying that
> proofing Sanskrit may be difficult.

I wasn't aware that there is indeed Sanskrit transcribed to Latin ... so
then, sa-Latn might make sense.

  Eike

-- 
GPG key 0x6A6CD5B765632D3A - 2265 D7F3 A7B0 95CC 3918  630B 6A6C D5B7 6563 2D3A
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20190423/df1c9cb9/attachment.sig>


More information about the LibreOffice mailing list