[UTF-8] Aspell and UTF-8/Unicode

Elias Martenson elias-m@algonet.se
Mon, 16 Feb 2004 03:29:07 +0100


m=C3=A5n 2004-02-16 klockan 02.20 skrev Kevin Atkinson:

> So does the curses library use LC_CTYPE to determine what encoding the=20
> incoming string is?

Yes, that's what the empty string part of setlocale(LC_ALL,"") means:
"read the locale settings from the environment variables".

> Is it more portable to use wchar_t?  Or do most curses libraries that=20
> support the wchar_t support strings in UTF-8?

In fact, I would dare to guess that the UTF-8 method is more portable.
I'm not sure exactly how standard the wchar_t versions are. I don't have
any other systems except for my FC1 box nearby.

The UTF-8 method is more standard because you can take your code, make
sure you have the setlocale() call, make sure you do all the magic
needed (like using wcslen() instead of strlen() and making sure you
never grab individual char's from the strings) and then just run the
code. It will work correctly on a modern system using ncursesw, but if
you happen to have a legacy system (without a unicode-aware curses) the
exact same code will run. It'll get severely confused if faced with
extended (>8 bits) characters of course, but in no case are you any
worse off than you would have if you hadn't done the UTF-8 adoption.

Just remember to do some testing with unicode characters (at the very
least copy&paste from gucharmap or some chinese web page or something).
Especially if you use the UTF-8 method, since the program will compile
and run happily if you accidently link against ncurses instead of
ncursesw, but as soon as you put some non-ascii characters in there it
will give some very interesting visual results. :-)  Have a go and play
with it, UTF-8 is really fun! If nothing else you can join our IRC
channel and watch it in action. :-) (XChat and other IRC clients support
UTF-8).

Regards

Elias M=C3=A5rtenson