Tagging text as being in arbitrary complex-script languages

Eike Rathke erack at redhat.com
Tue Apr 23 16:00:22 UTC 2019


Hi Richard,

On Friday, 2019-04-19 03:32:34 +0100, Richard Wordingham wrote:

> In answer to what was intended to be a rhetorical question, I suppose
> und-Latn-t-sa-m0-iast and und-Latn-t-sa-m0-iso would work for the
> normative forms.

Seem.. at least when entered at https://r12a.github.io/app-subtags/ in
the Check form it doesn't overly complain.

However, I'd avoid 'und', to me it annotates as "can't determine what
this could be" and in fact it is listed as Undetermined.

Also, my guess is most applications would not support these tags at all.
Of course it depends what you want to use it for, whether it's inhouse
tagging you control the tools used with these tags, or meant for
publicly available classification of languages. Where some
standardization among the parties involved would come handy..

> I've successfully loaded a mocked up extension for the
> former (as explicitly using a Western script), though I don't much like
> the consequent tagging <style:text-properties ... fo:language="und"> in
> the document's content.xml.

Yes, that's ugly, but unavoidable. For which sa-Latn would be a better
solution.

  Eike

-- 
GPG key 0x6A6CD5B765632D3A - 2265 D7F3 A7B0 95CC 3918  630B 6A6C D5B7 6563 2D3A
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20190423/e5765c6a/attachment.sig>


More information about the LibreOffice mailing list