[UTF-8] Aspell and UTF-8/Unicode

Elias Martenson elias-m@algonet.se
Sun, 15 Feb 2004 16:31:03 +0100


s=C3=B6n 2004-02-15 klockan 16.07 skrev Kevin Atkinson:
> On Sun, 15 Feb 2004, Elias Martenson wrote:
>=20
> > s=C3=83=C2=B6n 2004-02-15 klockan 10.54 skrev Kevin Atkinson:
> >=20
> > >    The bottom line is that keeping Aspell 8-bit internally is a ver=
y
> > > well though out decision that is not likely to change any time soon.
> > > Fell free to challenge me on it, but, don't expect me to change my =
mind
> > > unless you can bring up some point that I have not thought of befor=
e
> > > and quite possible a patch to solve cleanly convert Aspell to Unico=
de
> > > internally with out a serious performance lost OR serious memory us=
age
> > > increase.
> >=20
> > Thanks for the explanation. I only have one question regarding the ab=
ove
> > quoted section:
> >=20
> > How do you intend to deal with asian languages? I just cannot underst=
and
> > how you handle Japanese, based on the explanation you just gave.
>=20
> Can Japanese be spell checked in the traditional fashion, or at all?

Yes it can. I don't speak Japanese myself, The traditional way would use
the Hiragana and Katakana which are two phonetic alphabets. I am unsure
about Katakana. I suppose even Katakana has many multi-letter words just
like Chinese does.

However, there are other interesting languages. As an example Ethiopian
resides in unicode at U+1200 to U+137F. They do not fit inside one byte.

Would performance really be such a problem with full unicode support? I
realise some algorithms would hev to be redesigned, but wouldn't it be
worth it to enjoy the greater flexibility?

Regards

Elias M=C3=A5rtenson