COMPOUND_TEXT versus UTF8_STRING

Roland Mainz roland.mainz at nrubsig.org
Wed Sep 22 19:34:07 EEST 2004


Markus Kuhn wrote:
> Sebastien wrote on 2004-09-22 09:07 UTC:
> > Where can I find a converter function or library which supports the
> > following string conversions:
> > - COMPOUND_TEXT to local encoding (defined by $LANG)
> > - local encoding (defined by $LANG) to COMPOUND_TEXT
> > - COMPOUND_TEXT to UTF8
> > - UTF8 to COMPOUND_TEXT
> 
> Which reminds me to bring up the underlying more fundamental question,
> namely the future of COMPOUND_TEXT.
> 
> COMPOUND_TEXT is an implementation of ISO 2022, a horrendously complex
> and impractical way of switching between multiple character sets within
> the same string, that clearly failed on the market place, and is no
> longer used today except for some CJK email. Mule Emacs used something
> similar for a while, but they are now moving to UTF-8 as the sole
> internal encoding for Emacs 23. All major web browsers have done the
> same long ago.
> 
> COMPOUND_TEXT is in my opinion obsolete, and we should start thinking
> about a way to smoothly deprecate it from the standard, and make the way
> free for universally replacing it with the so much simpler and more
> practical UTF8_STRING. ISO 2022 is dead, and so should COMPOUND_TEXT be.

It is not easy to get rid of COMPOUND_TEXT since many X11 specs have to
be rewritten/updated to do that. And backwards-compatibility to existing
(binary) applications is required, too.

What about adding an extension to ISO 2022 which defines a new sequence
to mark the following string as UTF-8 - is that possible ? That way
backwards-compatibility to COMPOUND_TEXT is maintained and existing
applications still work without any changes. Is that possible (to be
honestly I don't know much about COMPOUND_TEXT) ?

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.mainz at nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)



More information about the xdg mailing list