[utf-8] Aspell and Unicode Normalization

Kevin Atkinson kevin at atkinson.dhs.org
Tue Mar 23 08:30:45 PST 2004


I thought a bit more about Unicode Normalization:

Because Unicode contains a large number of precomposed characters there
are multiple ways a character can be represented.  For example letter
a* can either be represented as

     U+00E5 LATIN SMALL LETTER A WITH RING ABOVE
or
     U+0061 LATIN SMALL LETTER A + U+030A COMBINING RING ABOVE

   By performing normalization first Aspell will only see one of these
representations.  The exact form of normalization depends on the
language.  Give the choice of

  1. Precomposed character

  2. Base letter + combining character(s)

  3. Base letter only

if the precomposed charter is in the target character set then (1), if
both the base and combing character is present than (2), otherwise (3).

[From the manual.  Please excuse the a*.  Texinfo is to stupid to know that 
is is supported in iso-8859-1]

-- 
http://kevin.atkinson.dhs.org




More information about the utf-8 mailing list