[PATCH 0/5] Improve text protocol

Bill Spitzak spitzak at gmail.com
Tue Apr 16 18:19:47 PDT 2013


Jan Arne Petersen wrote:

> I completely agree that editing UTF-8 text as UTF-8 is fine.
> 
> I am just wondering if we should have offsets in "Unicode code points"
> (with the addition that for invalid byte sequences each byte counts as
> one code point) or offsets in bytes.

The reason for the offset in bytes is that it is unambiguous about what 
position it means. Though I think erros should count as one code point, 
this avoids the need to define it at all, because the client does not 
have to agree with the input method about how to count them.

> And when we use offsets in byte how should the toolkit and input method
> handle offsets in bytes which do not match code points.
> 
> For example we have a surrounding text of "€–" (cursor is at offset 0):
> 0xe2 0x82 0xac 0xe2 0x80 0x93
> 
> What should the toolkit do with such requests like the following?
> * delete_surrounding_text(index: 1, length: 3)

I would delete the bytes indicated and show the resulting string, with 
error boxes for the now-bad bytes.

> * cursor_position(index: 2)

I would place the cursor at the position of the glyph produced by that 
byte, which could include some bytes on either side of it. Note that for 
combining characters this is a problem that needs to be solved even for 
valid UTF-8 (ie what does it mean if you point between the letter and 
the accent?).

> or
> * preedit_style(index: 2, length: 3, style: underline) with above text
> as preedit.

I would remember the positions of styling as bytes. However the renderer 
can render as though they are moved left to the first break between 
glyphs (ie it will preedit-highlight the character the first byte is in, 
and if the preedit region ends in the middle of a glyph then that glyph 
will not be preedited). Again this problem needs to be solved for 
combining characters anyway so this is not any more difficult.

In all cases the client can potentially detect that the input method is 
screwing up, and perhaps report this as a warning message.


More information about the wayland-devel mailing list