[PATCH 0/5] Improve text protocol

Jan Arne Petersen jpetersen at openismus.com
Thu May 2 13:06:04 PDT 2013


On 05/02/2013 09:56 PM, Kristian Høgsberg wrote:
> On Tue, Apr 16, 2013 at 06:19:47PM -0700, Bill Spitzak wrote:
>> Jan Arne Petersen wrote:
>>
>>> I completely agree that editing UTF-8 text as UTF-8 is fine.
>>>
>>> I am just wondering if we should have offsets in "Unicode code points"
>>> (with the addition that for invalid byte sequences each byte counts as
>>> one code point) or offsets in bytes.
>>
>> The reason for the offset in bytes is that it is unambiguous about
>> what position it means. Though I think erros should count as one
>> code point, this avoids the need to define it at all, because the
>> client does not have to agree with the input method about how to
>> count them.
>>
>>> And when we use offsets in byte how should the toolkit and input method
>>> handle offsets in bytes which do not match code points.
>>>
>>> For example we have a surrounding text of "€–" (cursor is at offset 0):
>>> 0xe2 0x82 0xac 0xe2 0x80 0x93
>>>
>>> What should the toolkit do with such requests like the following?
>>> * delete_surrounding_text(index: 1, length: 3)
>>
>> I would delete the bytes indicated and show the resulting string,
>> with error boxes for the now-bad bytes.
>>
>>> * cursor_position(index: 2)
>>
>> I would place the cursor at the position of the glyph produced by
>> that byte, which could include some bytes on either side of it. Note
>> that for combining characters this is a problem that needs to be
>> solved even for valid UTF-8 (ie what does it mean if you point
>> between the letter and the accent?).
>>
>>> or
>>> * preedit_style(index: 2, length: 3, style: underline) with above text
>>> as preedit.
>>
>> I would remember the positions of styling as bytes. However the
>> renderer can render as though they are moved left to the first break
>> between glyphs (ie it will preedit-highlight the character the first
>> byte is in, and if the preedit region ends in the middle of a glyph
>> then that glyph will not be preedited). Again this problem needs to
>> be solved for combining characters anyway so this is not any more
>> difficult.
>>
>> In all cases the client can potentially detect that the input method
>> is screwing up, and perhaps report this as a warning message.
> 
> I think consensus is that we leave the offsets as bytes.  I agree with
> that, considering that: 1) it shouldn't happen, 2) when it does, the
> toolkit will have deal with it.

Yes, that is how it is handled in the new version of this series:
http://lists.freedesktop.org/archives/wayland-devel/2013-April/008670.html


-- 
Jan Arne Petersen
Openismus GmbH
http://www.openismus.com


More information about the wayland-devel mailing list