[PATCH wayland-protocols] text: Create second version of text input protocol

Jan Arne Petersen janarne at gmail.com
Tue Feb 2 12:58:40 UTC 2016

On 29/01/16 00:33, Bill Spitzak wrote:
> On Wed, Jan 27, 2016 at 11:52 PM, Jan Arne Petersen <janarne at gmail.com
> <mailto:janarne at gmail.com>> wrote:
>     +      Text is generally UTF-8 encoded, indices and lengths are in
>     bytes.
> Remove the word "generally". *All* text in your api's are UTF-8.


>     +    <request name="activate">
>     +      <description summary="request activation">
>     +       Requests to activate a surface for text input (typically when a
>     +       text entry in it gets focus).
>     +
>     +       There can only be one active surface per client and seat.
>     When surface is
>     +       null all surfaces of the client get deactivated.
>     +      </description>
> I think clients should be allowed to send activate more than once per
> surface, it is to indicate the input focus switching between widgets.
> This removes the need for another api to indicate the widget is
> switching. Sending null for the surface, or a different event, would
> indicate that the keyboard focus is no longer on a text input widget.

We use enable/disable request now. That should be a bit easier to handle
from client side.

>     +    <request name="reset">
>     +      <description summary="reset">
>     +       Should be called by an editor widget when the input state
>     should be
>     +       reset, for example after the text was changed outside of the
>     normal
>     +       input method flow.
>     +      </description>
>     +    </request>
> I believe this request can be replaced by redundant activate requests.

It got integrated into update_state.

>     +    <request name="set_surrounding_text">
>     +      <description summary="sets the surrounding text">
>     +       Sets the plain surrounding text around the input position.
>     Text is
>     +       UTF-8 encoded. Cursor is the byte offset within the
>     +       surrounding text. Anchor is the byte offset of the
>     +       selection anchor within the surrounding text. If there is no
>     selected
>     +       text anchor is the same as cursor.
>     +      </description>
>     +      <arg name="text" type="string"/>
>     +      <arg name="cursor" type="uint"/>
>     +      <arg name="anchor" type="uint"/>
>     +    </request>
> The anchor could be very far away from the cursor, much farther than any
> small limit on the request size. I think this means the anchor position
> could be negative or larger than the text length. Sending this without
> clamping would be a good idea, the input method would then have a hint
> where the anchor is, and it can clamp the value itself.
> If the client wants new text to replace the selected text, the text
> between anchor+cursor will be deleted, and any input method decisions
> would depend on the text outside the anchor..cursor range. This may be
> larger than the allowed buffer size, so I think clients have to send the
> text as though the selection was deleted.
> A client that does not want to replace the selected text could still
> send an anchor, but then I am not clear if the input method can take
> advantage of knowing what characters are selected to modify the results.
> So it is possible the anchor is not needed at all.

Ok I use an int for anchor now. It still make sense for an input method
to know what is in the selection even when it gets replaced by new
entered text.

>     +      <entry name="auto_completion" value="0x1" summary="suggest
>     word completions"/>
>     +      <entry name="auto_correction" value="0x2" summary="suggest
>     word corrections"/>
> I think you might want non-zero bits to *disable* features. This allows
> zero to be the default, and means that if some new correction is
> invented in the future then they can default to on without you having to
> modify the numerical value of default.

Toolkits need to handle default on client side now.

>     +      <entry name="hidden_text" value="0x40" summary="characters
>     should be hidden"/>
> Can you explain this? The client still has to do something to display
> dots instead of characters, right? My guess is that this is an indicator
> that the input method should not show any of the surrounding text in
> popup controls. For instance maybe it can suggest word completion for
> your password, but the display must show stars for the already-typed
> characters. Is this correct?
> Hopefully how clients and the input methods show hidden characters can
> be agreed by convention, rather than adding more api to communicate that.

Yes you do not want to show already typed text on a virtual keyboard in
a password field. Usually the keyboard would show nothing then, no
reason to show dots/stars on the virtual keyboard.

>     +      <entry name="latin" value="0x100" summary="just latin
>     characters should be entered"/>
> Do you mean ASCII? It seems unlikely you mean "the subset of Unicode
> that they have declared are Latin characters" since that excludes
> numbers and space and includes vast numbers of characters that some
> software will not handle.

I guess we can remove it. Instead clients should just use

>     +      <entry name="normal" value="0" summary="default input,
>     allowing all characters"/>
>     +      <entry name="alpha" value="1" summary="allow only alphabetic
>     characters"/>
>     +      <entry name="digits" value="2" summary="allow only digits"/>
>     +      <entry name="number" value="3" summary="input a number
>     (including decimal separator and sign)"/>
>     +      <entry name="phone" value="4" summary="input a phone number"/>
>     +      <entry name="url" value="5" summary="input an URL"/>
>     +      <entry name="email" value="6" summary="input an email address"/>
> I think it will work a lot better if the client can indicate in
> set_surrounding_text that only a certain class of character is
> acceptable right now. So if it is a date of for nn/nn/nn, then it will
> indicate that slash or digit is required, depending on what is to the
> left of the cursor. And the meaning of "phone number" and "date" and
> "time" and "url" (and "zip code" and "ssn" and "inventory part number")
> are all thus moved to the client.

Toolkits do not support that (in contrast to the content hints we
implemented here).

It is also useful to have a virtual keyboard adapted for entering urls
with keys for '.' '/' or '.com' or for emails with '@'. The virtual
keyboard can also provide completion based on url/email for example.

>     +      <entry name="name" value="7" summary="input a name of a person"/>
> This gets more interesting because it may change the spelling corrector.

Yes and use address book for completion.

>     +    <enum name="update_state">
>     +      <entry name="request" value="1" summary="sent state after
>     request"/>
> Confusing. Actually your event is called "request", not to be confused
> with what Wayland calls a "request". At least change this to the name of
> the event (which is "request_state"). Better yet rename the event so it
> does not have the word "request" in it.

Yes changed it the event is called demand_full_state now.

>     +    <request name="invoke_action">
>     +      <arg name="button" type="uint"/>
>     +      <arg name="index" type="uint"/>
>     +    </request>
> Missing documentation? I would guess this is how clients send the
> keyboard events to the input method?

Added description. It is about clicking/touching within the
composing/pre-edit text.

>     +    <event name="enter">
>     +      <description summary="enter event">
>     +       Notification that this seat's text-input focus is on a
>     certain surface.
>     +
>     +       When the seat has one or more keyboards the text-input focus
>     follows the
>     +       keyboard focus.
>     +      </description>
>     +      <arg name="serial" type="uint"/>
>     +      <arg name="surface" type="object" interface="wl_surface"/>
>     +    </event>
> I don't understand how this is used. It seems like the client will get
> the keyboard focus events, and then tells the input method, not the
> other way around.

There might be no hardware keyboards present so we provide enter/leave
for text focus. If there is a hardware keyboard text focus is the same
as keyboard focus.

Clients just enable/disable a surface for text input, which surface has
text focus (keyboard focus) is handled by compositor.

>     +    <enum name="preedit_style">
>     +      <entry name="default" value="0" summary="default style for
>     composing text"/>
>     +      <entry name="none" value="1" summary="style should be the
>     same as in non-composing text"/>
>     +      <entry name="active" value="2"/>
>     +      <entry name="inactive" value="3"/>
>     +      <entry name="highlight" value="4"/>
>     +      <entry name="underline" value="5"/>
>     +      <entry name="selection" value="6"/>
>     +      <entry name="incorrect" value="7"/>
>     +    </enum>
> It is not clear from this if "default" is different from all the others,
> or is it equal to "active"?

Added more description to v3:

default: "default style for composing text"
none: "composing text should be shown the same as non-composing text"
active: "composing text might be bold"
inactive: "composing text might be cursive"
highlight: "composing text might have a different background color"
underline: "composing text might be underlined"
selection: "composing text should be shown the same as selected text"
incorrect: "composing text might be underlined with a red wavy line"

>     +    <event name="preedit_styling">
>     +      <description summary="pre-edit styling">
>     +       Sets styling information on composing text. The style is
>     applied for
>     +       length bytes from index relative to the beginning of the
>     composing
>     +       text (as byte offset). Multiple styles can
>     +       be applied to a composing text by sending multiple
>     preedit_styling
>     +       events.
> Can more than one of these be applied to the same bytes? If so, are they
> allowed to intersect arbitrarily? I think it is reasonable to require
> the input method to require any ranges to be entirely inside or outside
> each previously-sent ranges. This makes it easier for a client to
> mindlessly insert <b> and </b> tags into pango input.

It might make sense to combine some so I do not want to exclude that.
>     +    <event name="preedit_cursor">
>     +      <description summary="pre-edit cursor">
>     +       Sets the cursor position inside the composing text (as byte
>     +       offset) relative to the start of the composing text. When
>     index is a
>     +       negative number no cursor is shown.
>     +
>     +       This event is handled as part of a following preedit_string
>     event.
>     +      </description>
> Might want to say what happens if no cursor is sent. My guess is that
> the cursor is put at the end of the preedit string.

Yes exactly added that to the documentation.

>     +    <event name="keysym">
>     +      <description summary="keysym">
>     +       Notify when a key event was sent. Key events should not be used
>     +       for normal text input operations, which should be done with
>     +       commit_string, delete_surrounding_text, etc. The key event
>     follows
>     +       the wl_keyboard key event convention. Sym is a XKB keysym,
>     state a
>     +       wl_keyboard key_state. Modifiers are a mask for effective
>     modifiers
>     +       (where the modifier indices are set by the modifiers_map event)
>     +      </description>
>     +      <arg name="serial" type="uint" summary="serial of the latest
>     known text input state"/>
>     +      <arg name="time" type="uint"/>
>     +      <arg name="sym" type="uint"/>
>     +      <arg name="state" type="uint"/>
>     +      <arg name="modifiers" type="uint"/>
>     +    </event>
> Is this for keysyms that the input method did not handle? Or for fake
> key strokes? Or both?

For synthetic key events (fake stroke) for unhandled real hardware key
events wl_keyboard should be used (keyboard focus is the same as text

> This provides a method for a client to avoid using libxkbd. Though I
> still do not understand the problem, there seems to be a lot of claims
> here that it is not possible, because of something to do with different
> keymaps.

There might be no hardware keyboard (so no wl_keyboard) and we do not
have keycodes or the whole xkb state for synthetic key events but just a
sym and modifiers.

>     +    <event name="text_direction">
>     +      <description summary="text direction">
>     +       Sets the text direction of input text.
>     +
>     +       It is mainly needed for showing input cursor on correct side
>     of the
>     +       editor when there is no input yet done and making sure neutral
>     +       direction text is laid out properly.
>     +      </description>
> I'm not sure what the purpose of this is, but I feel like this should be
> part of the preedit events. The preedit may in fact be an empty string,
> and then the client will need to know what end to put the cursor on.

I do not see the need to fold it into pre-edit. Seems like more
complicate than just sending an extra event when text direction matters

>     +    <event name="request_surrounding_text">
>     +      <description summary="request surrounding text from client">
>     +       Request to get the surrounding text and cursor position sent
>     from the client.
>     +      </description>
>     +      <arg name="serial" type="uint" summary="serial of the latest
>     known text input state"/>
>     +      <arg name="flags" type="uint"/>
>     +      <arg name="before_cursor" type="int"/>
>     +      <arg name="after_cursor" type="int"/>
>     +    </event>
> Would prefer that there not be two events that cause the same response.
> Maybe this should be an indicator of how much text the input method
> wants, when the request_state is done.

Yes done it is called configure_surrounding_text now.

>     +    <event name="request_state">
>     +      <description summary="request state from client">
>     +       Request to get the surrounding text and cursor position sent
>     from the client.
>     +      </description>
>     +      <arg name="serial" type="uint" summary="serial of the latest
>     known text input state"/>
>     +      <arg name="flags" type="uint"/>
>     +    </event>
>     +  </interface>
> I do not believe this event is necessary. Clients should instead always
> send the surrounding text any time it changes, except if the change was
> due to a commit. This avoids a round trip.

Yes that is more for use cases like you switch the input method in while
a text field is focussed  and the new input method needs the state.
(Else we would need to cache state on compositor side which I do not like)

> Putting "request" into the event name is confusing, as "request" has a
> defined meaning in Wayland api.

Renamed it to demand_full_state.

Jan Arne

Jan Arne Petersen | jan.petersen at kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

More information about the wayland-devel mailing list