[PATCH 0/5] Improve text protocol

Jan Arne Petersen jpetersen at openismus.com
Tue Apr 16 12:49:20 PDT 2013


On 04/16/2013 06:06 PM, Bill Spitzak wrote:
> On 04/16/2013 01:16 AM, Jan Arne Petersen wrote:
> 
>> But we still need to think about how to handle invalid byte sequences
>> anyways. What do we expect a toolkit to do when text with invalid byte
>> sequences is inserted with commit_string? How to handle
>> delete_surrounding_text with the byte offsets not matching code points?
>> Should the toolkit ignore such requests or should we leave that as
>> undefined behavior?
> 
> You seem to be under the impression that it is impossible to edit text
> unless it is converted from UTF-8 to some other form? You do know that
> there can be encoding errors in UTF-16, right?
> 
> My recommendation is that the editor store UTF-8 and preserve error
> bytes. Handling of errors is a *DISPLAY* problem, not a storage problem.
> 
> Errors should show a single error glyph for each byte in the error. For
> instance the sequence 0xE0,0xC0,0x20 is two error bytes followed by a
> space (not a single error followed by a space or a single error as some
> systems will do). The reason for this rule is to allow bi-directional
> parsing of text with errors in it without having to look ahead more than
> 4 bytes and to match the UTF-16 encoding I describe below.

I completely agree that editing UTF-8 text as UTF-8 is fine.

I am just wondering if we should have offsets in "Unicode code points"
(with the addition that for invalid byte sequences each byte counts as
one code point) or offsets in bytes.

And when we use offsets in byte how should the toolkit and input method
handle offsets in bytes which do not match code points.

For example we have a surrounding text of "€–" (cursor is at offset 0):
0xe2 0x82 0xac 0xe2 0x80 0x93

What should the toolkit do with such requests like the following?
* delete_surrounding_text(index: 1, length: 3)
* cursor_position(index: 2)
or
* preedit_style(index: 2, length: 3, style: underline) with above text
as preedit.

Thanks
Jan Arne

-- 
Jan Arne Petersen
Openismus GmbH
http://www.openismus.com


More information about the wayland-devel mailing list