Tagging text as being in arbitrary complex-script languages

Tue Apr 23 20:27:06 UTC 2019

On Tue, 23 Apr 2019 18:00:22 +0200
Eike Rathke <erack at redhat.com> wrote:

> On Friday, 2019-04-19 03:32:34 +0100, Richard Wordingham wrote:

> > In answer to what was intended to be a rhetorical question, I
> > suppose und-Latn-t-sa-m0-iast and und-Latn-t-sa-m0-iso would work
> > for the normative forms.  
> 
> Seem.. at least when entered at https://r12a.github.io/app-subtags/ in
> the Check form it doesn't overly complain.

It seems that some people think that IAST also defines a Cyrillic
representation, so I think the 'Latn' is justified.

> However, I'd avoid 'und', to me it annotates as "can't determine what
> this could be" and in fact it is listed as Undetermined.

Well, as the two systems are international standards (the 'i' in
'iast' and 'iso'), it should be hard to tell whether the intended
audience is English, German, Japanese or whatever.  The what of the
underlying content is contained in the extension - in this case the
'sa'.

<snip>
> Yes, that's ugly, but unavoidable. For which sa-Latn would be a better
> solution.

And allow for mixtures of the two schemes!

Richard.