[UTF-8] Aspell and UTF-8/Unicode

Kevin Atkinson kevina@gnu.org
Sun, 15 Feb 2004 21:42:43 -0500 (EST)


On Mon, 16 Feb 2004, Elias Martenson wrote:

> m=C3=A5n 2004-02-16 klockan 02.20 skrev Kevin Atkinson:
> =

> > So does the curses library use LC_CTYPE to determine what encoding th=
e =

> > incoming string is?
> =

> Yes, that's what the empty string part of setlocale(LC_ALL,=22=22) mean=
s:
> =22read the locale settings from the environment variables=22.

Ok thanks.  I knew what setlocale does I just wanted to make sure that =

curses was using it.

> The UTF-8 method is more standard because you can take your code, make
> sure you have the setlocale() call, make sure you do all the magic
> needed (like using wcslen() instead of strlen() and making sure you
> never grab individual char's from the strings) =


DO you know of any code samples for efficiency UTF-8 manipulation?   I =

figure if I support 8-bit charater sets and UTF-8 that will be enough.  =

This means I can detect when UTF-8 is being used and just handle the UTF-=
8 =

strings more carefully, more efficient than converting to to wchar_t just=
 =

to get the length.  What I really need are things like =

  - length of utf-8 strings
  - length of the current utf-8 character

-- =

http://kevin.atkinson.dhs.org