[PATCH RFC] Add language and text-direction to text protocol

Thu Jan 31 13:19:00 PST 2013

Yichao Yu wrote:
> On Thu, Jan 31, 2013 at 2:20 PM, Bill Spitzak <spitzak at gmail.com> wrote:
>> On 01/31/2013 10:42 AM, Michael Hasselmann wrote:
>>
>>> You need to place your cursor at the left or right end of a text entry
>>> (actually, right or left of the possibly already existing text, on
>>> focus-in). An explicit text direction event makes it easier than having
>>> to wait and parse for the input method's text
>>
>> It sounds like this "language direction" is for placing the cursor in the
>> preedit/commit string? I'm a bit confused as to what you are talking about.
>>
>> I think the commit/preedit string should contain an actual cursor position,
>> it can be between any two bytes in it. There is no reason to limit it to the
>> start/end.
>>
> 
> It already does (although it doesn't make much sense to put the cursor
> inside a multi-byte character...),

I agree I just want to make sure that all offsets are measured in bytes, 
not in "characters" which can cause huge difficulties when the client 
and server do not agree on what a "character" is.

  the input method notify the client
> for preedit changes, cursor move and commit strings and the client
> will than draw the preedit text or commit string and move the cursor
> accordingly, there is no need to parse any text in order to do it
> right. Neither surrounding text nor the string being commit/set as
> preedit is directly related (the input method should already know
> enough about where the cursor should be when doing that).

Are there any input methods which don't replace the selected text? For 
what reason?
> 
>>> (which you can only really
>>> do after receiving the commit string, not while showing the preedit
>>> string). Without the extra event you have to accept annoying UI
>>> adjustments while already typing in text.
>>
>> I sure hope the client can do *exactly* the same thing with the preedit
>> versus the commit string. There is no reason for the data to be different.
>> The only difference is that the client remembers what text was placed by the
>> preedit string so it can delete it on the next preedit or commit or if the
>> input method aborts.
>>
>>
>>>> This might require the client to send the input method a large block of
>>>> "context", such as the entire current paragraph, so that it can figure
>>>> out the best insertion text so the direction indicators are right.
>>>
>>> We already do send surrounding text, no?
>>
>> Yes I guessed that was happening. I thought this would contain enough
>> information in Unicode for the input method to determine the direction.
>>
>> My concern is that a stupid client, and input methods who have to insert
>> them to make sure, would result in large numbers of direction indicator
>> characters being inserted into the text. If the input method as part of the
>> preedit/commit strings is allowed to delete some of the bytes of the
>> surrounding text, it could then remove redundant direction indicators.
>>
>> As I see it interaction would be something like this:
>>
>> 1. Client sends the entire surrounding text to the input method as a single
>> block of UTF-8 bytes, plus two byte offsets into it indicating the selected
> 
> Well, it is only practical to send limited length of text for
> performance issue. And I think the selected region is not really
> related to input method. It may be helpful to tell the input method if
> some text is selected at the cursor (to be replaced by the commit
> string) but that's a separate issue.

I thought the selected text would be useful for determining an initial 
input method state, for instance if it is a LTR number inside a RTL 
piece of text. If you don't think that is necessary then I think sending 
the paragraph with the selected text deleted would be ok. The client is 
free to send less than an entire paragraph if it thinks it is too large, 
but I'm hoping this definition will mean the input method will get any 
surrounding direction and language controls that Pango is using to 
render the text, so they agree.

> If what you mean by "selected region" is actually preedit, that should
> be a temporary text that get cleared when the text model lost focus.

Are there any input methods where the selected region is not replaced by 
the new text? The client certainly can know that the current selected 
region is due to a preedit and color it differently, but I don't see how 
there can be both a selected region and a preedit, therefore I was 
assuming the same code would be reused and thus calling this a selected 
region.

I was also under the impression that it is important for the input 
method to highlight only a portion of the preedit, not all of it. Ie it 
does not highlight characters on the edge that are "fixed" even if they 
will be deleted on an abort or modified to different combining 
characters as the user keeps editing the "varying part". Therefore it 
seemed the input method must send the region as well as the preedit 
text. It also needs to be able to place the cursor at an arbitary 
position inside the preedit string (though I believe it is ok if this is 
required to be one end of the selected region).

> The only similarity between preedit and commit is that they should be
> rendered the same. Unless requested or lost focus, the client should
> never change the preedit string or moving the cursor. And preedit is
> totally different from selected region.

My initial question was because another poster said that something could 
not be figured out from the preedit, but only by the commit. That seems 
wrong, the client must be able to figure exactly the same stuff from 
both. Her is the quote from michaelh at openismus.com:

"An explicit text direction event makes it easier than having
to wait and parse for the input method's text (which you can only really
do after receiving the commit string, not while showing the preedit
string)."

What is he talking about, it seems whatever he is talking about can be 
done after either a commit or a preedit.

It may be silly but it is possible that instead of a direction, the 
input method immediately sends a preedit that consists of a RTL 
indicator or something that forces Pango to right-justify the cursor.