Tagging text as being in arbitrary complex-script languages

Richard Wordingham richard.wordingham at ntlworld.com
Tue Apr 23 21:08:54 UTC 2019


On Tue, 23 Apr 2019 17:35:10 +0200
Eike Rathke <erack at redhat.com> wrote:

> Hi Richard,
> 
> On Thursday, 2019-04-18 20:40:01 +0100, Richard Wordingham wrote:

> > It sounds as though one has to specify the script where there is
> > doubt as to what type of script will dominate. Is it an issue if
> > there are two competing scripts of the same type, e.g Thai v. Lanna
> > for Northern Thai?  A dual script dictionary would correct
> > inefficiently.  

> Competing in the sense two different scripts under one language tag?
> I wouldn't do that and IMHO it would be wrong.

It's worse than that.  The spoken language nod-TH resolves, ignoring
subregional variations, into the three written groups:

nod-Lana-TH
nod-Thai-etymo-TH (name but not concept declared unsuitable on 10 Jan)
nod-Thai-phonetic-TH (ditto)

The scheme 'nod-Thai-etymo-TH' often accompanies published material in
non-Lana-TH. The New Testament is published in nod-Lana-TH and
'nod-Thai-phonetic-TH'.

Until I can find names for the Thai-script variants more specific to
Northern Thai, my plan is to handle the difference by letting the user
choose the dictionary if I ever get round to Thai script Northern Thai
dictionaries.  The biggest need I see for the variant tags is user
interfaces.

The Lana script dictionary is highly desirable for
handling the visual ambiguities in the script for the vernacular
languages and has high priority.  Eyeballs are probably good enough for
the Thai script.

Richard.


More information about the LibreOffice mailing list