[PATCH RFC] Add language and text-direction to text protocol

Thu Jan 31 14:23:49 PST 2013

On Thu, Jan 31, 2013 at 4:19 PM, Bill Spitzak <spitzak at gmail.com> wrote:
>
>
> Yichao Yu wrote:
>>
>> On Thu, Jan 31, 2013 at 2:20 PM, Bill Spitzak <spitzak at gmail.com> wrote:
>>>
>>> On 01/31/2013 10:42 AM, Michael Hasselmann wrote:
>>>
>>>> You need to place your cursor at the left or right end of a text entry
>>>> (actually, right or left of the possibly already existing text, on
>>>> focus-in). An explicit text direction event makes it easier than having
>>>> to wait and parse for the input method's text
>>>
>>>
>>> It sounds like this "language direction" is for placing the cursor in the
>>> preedit/commit string? I'm a bit confused as to what you are talking
>>> about.
>>>
>>> I think the commit/preedit string should contain an actual cursor
>>> position,
>>> it can be between any two bytes in it. There is no reason to limit it to
>>> the
>>> start/end.
>>>
>>
>> It already does (although it doesn't make much sense to put the cursor
>> inside a multi-byte character...),
>
>
> I agree I just want to make sure that all offsets are measured in bytes, not
> in "characters" which can cause huge difficulties when the client and server
> do not agree on what a "character" is.

Agree. Cannot remember which one that is but I'm sure there is an agreement. =)

>
>
>  the input method notify the client
>>
>> for preedit changes, cursor move and commit strings and the client
>> will than draw the preedit text or commit string and move the cursor
>> accordingly, there is no need to parse any text in order to do it
>> right. Neither surrounding text nor the string being commit/set as
>> preedit is directly related (the input method should already know
>> enough about where the cursor should be when doing that).
>
>
>
> Are there any input methods which don't replace the selected text? For what
> reason?
>
>>
>>>> (which you can only really
>>>> do after receiving the commit string, not while showing the preedit
>>>> string). Without the extra event you have to accept annoying UI
>>>> adjustments while already typing in text.
>>>
>>>
>>> I sure hope the client can do *exactly* the same thing with the preedit
>>> versus the commit string. There is no reason for the data to be
>>> different.
>>> The only difference is that the client remembers what text was placed by
>>> the
>>> preedit string so it can delete it on the next preedit or commit or if
>>> the
>>> input method aborts.
>>>
>>>
>>>>> This might require the client to send the input method a large block of
>>>>> "context", such as the entire current paragraph, so that it can figure
>>>>> out the best insertion text so the direction indicators are right.
>>>>
>>>>
>>>> We already do send surrounding text, no?
>>>
>>>
>>> Yes I guessed that was happening. I thought this would contain enough
>>> information in Unicode for the input method to determine the direction.
>>>
>>> My concern is that a stupid client, and input methods who have to insert
>>> them to make sure, would result in large numbers of direction indicator
>>> characters being inserted into the text. If the input method as part of
>>> the
>>> preedit/commit strings is allowed to delete some of the bytes of the
>>> surrounding text, it could then remove redundant direction indicators.
>>>
>>> As I see it interaction would be something like this:
>>>
>>> 1. Client sends the entire surrounding text to the input method as a
>>> single
>>> block of UTF-8 bytes, plus two byte offsets into it indicating the
>>> selected
>>
>>
>> Well, it is only practical to send limited length of text for
>> performance issue. And I think the selected region is not really
>> related to input method. It may be helpful to tell the input method if
>> some text is selected at the cursor (to be replaced by the commit
>> string) but that's a separate issue.
>
>
> I thought the selected text would be useful for determining an initial input
> method state, for instance if it is a LTR number inside a RTL piece of text.
> If you don't think that is necessary then I think sending the paragraph with
> the selected text deleted would be ok. The client is free to send less than
> an entire paragraph if it thinks it is too large, but I'm hoping this
> definition will mean the input method will get any surrounding direction and
> language controls that Pango is using to render the text, so they agree.

Agree on setting a initial state from the selected text, if the
protocol really let the input method know about that. But it should be
just a implementation detail of the input method.

>
>
>> If what you mean by "selected region" is actually preedit, that should
>> be a temporary text that get cleared when the text model lost focus.
>
>
> Are there any input methods where the selected region is not replaced by the
> new text? The client certainly can know that the current selected region is
> due to a preedit and color it differently, but I don't see how there can be
> both a selected region and a preedit, therefore I was assuming the same code
> would be reused and thus calling this a selected region.

Selection is the text you, err.., select (PRIMARY selection in X)
while preedit is a intermediate composite result ofter shown with
underline.
This[1] is a selection you are familiar with, while this[2] is a
preedit (Note the word "preedit" in the input region with underline).
You are right that normally they will not exist at the same time in
the same input region since the preedit will ofter be shown as
replacing the selected region and yes the client should restore what
was in the input region if nothing was commit but it's just a little
bit confusing to mix the two concepts.

[1] http://wstaw.org/m/2013/01/31/plasma-desktopql2231.png
[2] http://wstaw.org/m/2013/01/31/plasma-desktopAx2231.png

>
> I was also under the impression that it is important for the input method to
> highlight only a portion of the preedit, not all of it. Ie it does not
> highlight characters on the edge that are "fixed" even if they will be
> deleted on an abort or modified to different combining characters as the
> user keeps editing the "varying part". Therefore it seemed the input method
> must send the region as well as the preedit text. It also needs to be able
> to place the cursor at an arbitary position inside the preedit string
> (though I believe it is ok if this is required to be one end of the selected
> region).

It is not really OK (to only allow putting the cursor at one end of
the preedit string).. Or at least it will be a big regression (from X,
or from what we have now in the protocol).
It may be helpful to highlight part of the preedit although I don't
really think it's a must. The problem you mentioned is usually solved
by placing the cursor between the part of the preedit that are "fixed"
and the part that are not[3].

[3] http://wstaw.org/m/2013/01/31/plasma-desktopCE2231.png

>
>
>> The only similarity between preedit and commit is that they should be
>> rendered the same. Unless requested or lost focus, the client should
>> never change the preedit string or moving the cursor. And preedit is
>> totally different from selected region.
>
>
> My initial question was because another poster said that something could not
> be figured out from the preedit, but only by the commit. That seems wrong,
> the client must be able to figure exactly the same stuff from both. Her is
> the quote from michaelh at openismus.com:
>
>
> "An explicit text direction event makes it easier than having
> to wait and parse for the input method's text (which you can only really
>
> do after receiving the commit string, not while showing the preedit
> string)."
>
> What is he talking about, it seems whatever he is talking about can be done
> after either a commit or a preedit.
>
> It may be silly but it is possible that instead of a direction, the input
> method immediately sends a preedit that consists of a RTL indicator or
> something that forces Pango to right-justify the cursor.

I took it wrong at first, and now I think it make sense both to let
the client initialize the cursor according to the text direction and
having a separate request for that. Empty preedit strings can already
causes some weird problems in existing programs and I don't think it
is really necessary to embed this information (which shouldn't be
change on each commit/preedit update) in commit/preedit requests.