[Libreoffice] A thought about fdo#38095: Character classification for Western or Asian text font

Mon Jul 18 09:14:13 PDT 2011

Hi all,

According to a discussion on the Japanese local mailing list,
the following issue may prevent some users from migrating
to 3.4.x series:
https://bugs.freedesktop.org/show_bug.cgi?id=38095

I am not yet sure how it works in 3.3.x, but at least for Calc
on master, there seems a gap around determining a font from the
Unicode Script Property of content of a cell.

In ScDocument::GetStringScriptType() at sc/source/core/data/documen6.cxx,
a breakiterator of type i18n::ScriptType::WEAK is just ignored,
while BreakIteratorImpl::getScriptClass()
at i18npool/source/breakiterator/breakiteratorImpl.cxx
maps entries of "Common" value of the script property to WEAK.
More worse, "DIGIT ZERO..DIGIT NINE" and
"FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NINE" are both in
"Common" [2], and, both of "LATIN SMALL LETTER A..LATIN SMALL LETTER Z"
and "FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z"
are in "Latin", so it can not distinguish fullwidth ones from
ascii ones.

I found [1] mentions a similar situaion, like:
> Note that while it is necessary to include Latin in the preceding expression
> to ensure that it can cover the typical script use found in many Japanese
> texts, doing so would make it difficult to isolate a run of Japanese inside
> an English document, for example.

Of course next I should check the 3.3.x way, anyway it would be
great if someone give me a hint for how to deal with the case.

[1] http://unicode.org/reports/tr24/
[2] http://unicode.org/Public/UNIDATA/Scripts.txt

Cheers,
-- Takeshi Abe