COMPOUND_TEXT versus UTF8_STRING

Thu Sep 23 19:37:13 EEST 2004

Around 9 o'clock on Sep 23, Markus Kuhn wrote:

> There is certainly no harm done by encouraging in the next ICCCM version
> the recipients of all the properties where STRING and COMPOUND_TEXT are
> allowed today to also accept UTF8_STRING, in addition to the existing
> STRING and COMPOUND_TEXT ones.

Yes, it seems reasonable to prepare applications to accept UTF-8 in these 
strings.; that's well within the scope of the existing ICCCM wording, and 
given the current Xlib implementation, it's already supported by most 
applications today.

> In addition, there is little harm done in using UTF8_STRING whenever the
> text to be transmitted contains at least one character for which STRING
> and COMPOUND_TEXT provide no encoding (think of Ethiopian or Vietnamese
> window titles).

Owen Taylor pointed out that Bruno Haible added UTF-8 support to the Xlib 
COMPOUND_TEXT code.  However, this does not mean that applications will 
actually understand the resulting UTF-8 sequences; it requires that the 
receiving application be running a compatible version of Xlib.

> Unless the recipient understands UTF-8 (and therefore
> probably also implements already UTF8_STRING), the data will be
> meaningless to them anyway.

I note that the existing Xlib property conversion functions handle
UTF8_STRING in parallel with COMPOUND_TEXT meaning that any applications
using the X.org Xlib to handle COMPOUND_TEXT will transparently handle
UTF8_STRING already.

It appears to me that we have two reasonable directions to go:

 1)	Assume all applications use X.org Xlib functions to handle text
	values, in which case they will handle UTF-8 as either 
	COMPOUND_TEXT or UTF8_STRING (and in which case we should
	just use UTF8_STRING).

 2)	Accept that some applications are not using the X.org Xlib 
	text handling functions, in which case we cannot use UTF-8
	in any form for property values (neither as UTF8_STRING nor
	even COMPOUND_TEXT with UTF-8 sequences).

In case 1), we're all set -- just start using UTF8_STRING values for the 
standard ICCCM properties and expect that applications will "just work".

In case 2), I see the EWMH as the obvious solution -- set the existing 
ICCCM properties using STRING (or, if you must, COMPOUND_TEXT without 
UTF-8 sequences) and place the actual data in the EWMH properties.

What I don't see the need for is support for UTF-8 sequences in 
COMPOUND_TEXT format strings -- given the necessary Xlib support exists 
only in a library which transparently handles UTF8_STRING format 
properties, there's little reason to add the COMPOUND_TEXT wrapper.

Unfortunately, once we go with the EWMH as the standard, I see no way of 
getting out of that; we've essentially said that there are only two 
possible encodings for TEXT properties -- STRING and COMPOUND_TEXT 
(without UTF-8 sequences).

-keith

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/xdg/attachments/20040923/5c24eca7/attachment.pgp