[utf-8] Aspell and Unicode Normalization
Kevin Atkinson
kevin at atkinson.dhs.org
Tue Mar 23 08:30:45 PST 2004
I thought a bit more about Unicode Normalization:
Because Unicode contains a large number of precomposed characters there
are multiple ways a character can be represented. For example letter
a* can either be represented as
U+00E5 LATIN SMALL LETTER A WITH RING ABOVE
or
U+0061 LATIN SMALL LETTER A + U+030A COMBINING RING ABOVE
By performing normalization first Aspell will only see one of these
representations. The exact form of normalization depends on the
language. Give the choice of
1. Precomposed character
2. Base letter + combining character(s)
3. Base letter only
if the precomposed charter is in the target character set then (1), if
both the base and combing character is present than (2), otherwise (3).
[From the manual. Please excuse the a*. Texinfo is to stupid to know that
is is supported in iso-8859-1]
--
http://kevin.atkinson.dhs.org
More information about the utf-8
mailing list