[PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol

Mon Jul 23 12:26:04 UTC 2018

Hi Carlos,

thanks for reviewing!

On Tue, 17 Jul 2018 19:18:36 +0200
Carlos Garnacho <carlosg at gnome.org> wrote:

> Hi!,
> 
> (Way way late, trying to revive the conversation...)
> 
> On Thu, May 3, 2018 at 9:22 PM, Dorota Czaplejewicz
> <dorota.czaplejewicz at puri.sm> wrote:
> > On Thu, 3 May 2018 20:47:27 +0200
> > Silvan Jegen <s.jegen at gmail.com> wrote:
> >  
> >> Hi Dorota
> >>
> >> Some comments and typo fixes below.
> >>
> >> On Thu, May 03, 2018 at 05:41:21PM +0200, Dorota Czaplejewicz wrote:  
> >> > This new protocol description is a simplification over v2.
> >> >
> >> > - All pre-edit text styling is gone.
> >> > - Pre-edit cursor can span characters.
> >> > - No events regarding input panel (OSK) state nor covered rectangle.
> >> >   Compositors are still free to handle situations where the keyboard
> >> >   focus rectangle is covered by the input panel.
> >> > - No set_preferred_language request for clients.
> >> > - There is no event to send keysyms. Compositors can use wl_keyboard
> >> >   interface instead.
> >> > - All state is double-buffered, with specified state.
> >> > - Use Unicode codepoints to measure strings.
> >> >
> >> > Signed-off-by: Dorota Czaplejewicz <dorota.czaplejewicz at puri.sm>
> >> > Signed-off-by: Carlos Garnacho <carlosg at gnome.org>
> >> > ---
> >> > This is the next update coming from Purism to perfect the text input protocol.
> >> >
> >> > The following changes added on top of PATCHv3:
> >> >
> >> > - Fixed whitespaces.
> >> > - Removed enable flags - the same information can be gathered from the first requests after enter.
> >> > - Changed offsets inside UTF-8 strings to use Unicode character counts in order to remove the possibility of communicating invalid state.
> >> > - Specified the exact lifetime of double-buffered state, and initial values.
> >> > - Made changes requested by the IM double-buffered.
> >> >
> >> > Some questions remain open. One is: how to specify how much text to capture in set_surrounding_text, and how often to update?  
> 
> IMHO the only reason to state it here is that it's more likely that a
> lazy implementation will try to squeeze a full book here, than eg. an
> application setting an insanely long title. But certainly other
> messages across protocols may hit this limit (the long title issue
> wasn't made up :).
> 
> As for how much, I think it ultimately depends on the IM behind. Text
> correction probably just wants the current word, any sort of
> prediction will probably require phrases to paragraphs, char
> composition can probably do without. Sounds like this could be some
> sort of hint, but I don't think IMs can tell you today how much text
> do they want...
> 
> >> >
> >> > A possible change that I decided against for now is to replace enable/disable events by create/destroy of a new object, which would make more state lifetimes encoded in the protocol.
> >> >
> >> > After reading a blog post on fcitx [0], I got the impression that letting the compositor know some persistent ID of a text edit instance could be useful, however I'm not sure what the use cases are.
> >> >
> >> > As always, I'm happy to hear feedback.
> >> >
> >> > Cheers,
> >> > Dorota Czaplejewicz
> >> >
> >> > [0] https://www.csslayer.info/wordpress/fcitx-dev/gaps-between-wayland-and-fcitx-or-all-input-methods/
> >> >
> >> >  Makefile.am                                    |   1 +
> >> >  unstable/text-input/text-input-unstable-v3.xml | 362 +++++++++++++++++++++++++
> >> >  2 files changed, 363 insertions(+)
> >> >  create mode 100644 unstable/text-input/text-input-unstable-v3.xml
> >> >
> >> > diff --git a/Makefile.am b/Makefile.am
> >> > index 4b9a901..86d7ca9 100644
> >> > --- a/Makefile.am
> >> > +++ b/Makefile.am
> >> > @@ -3,6 +3,7 @@ unstable_protocols =                                                                \
> >> >     unstable/fullscreen-shell/fullscreen-shell-unstable-v1.xml              \
> >> >     unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml                      \
> >> >     unstable/text-input/text-input-unstable-v1.xml                          \
> >> > +   unstable/text-input/text-input-unstable-v3.xml                          \
> >> >     unstable/input-method/input-method-unstable-v1.xml                      \
> >> >     unstable/xdg-shell/xdg-shell-unstable-v5.xml                            \
> >> >     unstable/xdg-shell/xdg-shell-unstable-v6.xml                            \
> >> > diff --git a/unstable/text-input/text-input-unstable-v3.xml b/unstable/text-input/text-input-unstable-v3.xml
> >> > new file mode 100644
> >> > index 0000000..ed5204f
> >> > --- /dev/null
> >> > +++ b/unstable/text-input/text-input-unstable-v3.xml
> >> > @@ -0,0 +1,362 @@
> >> > +<?xml version="1.0" encoding="UTF-8"?>
> >> > +
> >> > +<protocol name="text_input_unstable_v3">
> >> > +  <copyright>
> >> > +    Copyright © 2012, 2013 Intel Corporation
> >> > +    Copyright © 2015, 2016 Jan Arne Petersen
> >> > +    Copyright © 2017, 2018 Red Hat, Inc.
> >> > +    Copyright © 2018 Purism SPC
> >> > +
> >> > +    Permission to use, copy, modify, distribute, and sell this
> >> > +    software and its documentation for any purpose is hereby granted
> >> > +    without fee, provided that the above copyright notice appear in
> >> > +    all copies and that both that copyright notice and this permission
> >> > +    notice appear in supporting documentation, and that the name of
> >> > +    the copyright holders not be used in advertising or publicity
> >> > +    pertaining to distribution of the software without specific,
> >> > +    written prior permission.  The copyright holders make no
> >> > +    representations about the suitability of this software for any
> >> > +    purpose.  It is provided "as is" without express or implied
> >> > +    warranty.
> >> > +
> >> > +    THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS
> >> > +    SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
> >> > +    FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
> >> > +    SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> >> > +    WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
> >> > +    AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
> >> > +    ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
> >> > +    THIS SOFTWARE.
> >> > +  </copyright>
> >> > +
> >> > +  <interface name="zwp_text_input_v3" version="1">
> >> > +    <description summary="text input">
> >> > +      The zwp_text_input_v3 interface represents text input and input methods
> >> > +      associated with a seat. It provides enter/leave events to follow the
> >> > +      text input focus for a seat.
> >> > +
> >> > +      Requests are used to enable/disable the text-input object and set
> >> > +      state information like surrounding and selected text or the content type.
> >> > +      The information about the entered text is sent to the text-input object
> >> > +      via the pre-edit and commit_string events.
> >> > +
> >> > +      Text is valid UTF-8 encoded, indices and lengths are in code points. If a
> >> > +      grapheme is made up of multiple code points, an index pointing to any of
> >> > +      them should be interpreted as pointing to the first one.  
> >>
> >> That way we make sure we don't put the cursor/anchor between bytes that
> >> belong to the same UTF-8 encoded Unicode code point which is nice. It
> >> also means that the client has to parse all the UTF-8 encoded strings
> >> into Unicode code points up to the desired cursor/anchor position
> >> on each "preedit_string" event. For each "delete_surrounding_text" event
> >> the client has to parse the UTF-8 sequences before and after the cursor
> >> position up to the requested Unicode code point.
> >>
> >> I feel like we are processing the UTF-8 string already in the
> >> input-method. So I am not sure that we should parse it again on the
> >> client side. Parsing it again would also mean that the client would need
> >> to know about UTF-8 which would be nice to avoid.
> >>
> >> Thoughts?  
> >
> > The client needs to know about Unicode, but not necessarily about UTF-8. Specifying code points is actually an advantage here, because byte offsets are inherently expressed relative to UTF-8. By counting with code points, client's internal representation can be UTF-16 or maybe even something else.  
> 
> I personally think byte offsets are more handy than codepoints:
> pointer math is O(1) and str*() functions are "sensible" (on UTF-8 at
> least, and past the bytes!=chars gotchas), it's relatively simple to
> find out whether you are in the middle of a UTF-8 char, it seems
> simpler to deal with than the other way around if utf16/codepoints are
> used in either side; and this might even be moot as all parties are
> interested in chopping strings between word/char boundaries.
> 
> As for using UTF-8 specifically, other protocols do use it for
> exchange of strings (eg. xdg_surface.set_title). It's the perfect fit
> for glib/pango/etc, so it wouldn't be me who objects, either :).
> 
> Cheers,
>   Carlos

I think you're tipping the scales here. In the interest of having the protocol move forward I'm changing code points to bytes, since I don't think they make a huge difference in practice. v5 incoming!

Cheers,
Dorota
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20180723/cea5f061/attachment.sig>