[PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
Dorota Czaplejewicz
dorota.czaplejewicz at puri.sm
Mon Jul 23 12:26:04 UTC 2018
Hi Carlos,
thanks for reviewing!
On Tue, 17 Jul 2018 19:18:36 +0200
Carlos Garnacho <carlosg at gnome.org> wrote:
> Hi!,
>
> (Way way late, trying to revive the conversation...)
>
> On Thu, May 3, 2018 at 9:22 PM, Dorota Czaplejewicz
> <dorota.czaplejewicz at puri.sm> wrote:
> > On Thu, 3 May 2018 20:47:27 +0200
> > Silvan Jegen <s.jegen at gmail.com> wrote:
> >
> >> Hi Dorota
> >>
> >> Some comments and typo fixes below.
> >>
> >> On Thu, May 03, 2018 at 05:41:21PM +0200, Dorota Czaplejewicz wrote:
> >> > This new protocol description is a simplification over v2.
> >> >
> >> > - All pre-edit text styling is gone.
> >> > - Pre-edit cursor can span characters.
> >> > - No events regarding input panel (OSK) state nor covered rectangle.
> >> > Compositors are still free to handle situations where the keyboard
> >> > focus rectangle is covered by the input panel.
> >> > - No set_preferred_language request for clients.
> >> > - There is no event to send keysyms. Compositors can use wl_keyboard
> >> > interface instead.
> >> > - All state is double-buffered, with specified state.
> >> > - Use Unicode codepoints to measure strings.
> >> >
> >> > Signed-off-by: Dorota Czaplejewicz <dorota.czaplejewicz at puri.sm>
> >> > Signed-off-by: Carlos Garnacho <carlosg at gnome.org>
> >> > ---
> >> > This is the next update coming from Purism to perfect the text input protocol.
> >> >
> >> > The following changes added on top of PATCHv3:
> >> >
> >> > - Fixed whitespaces.
> >> > - Removed enable flags - the same information can be gathered from the first requests after enter.
> >> > - Changed offsets inside UTF-8 strings to use Unicode character counts in order to remove the possibility of communicating invalid state.
> >> > - Specified the exact lifetime of double-buffered state, and initial values.
> >> > - Made changes requested by the IM double-buffered.
> >> >
> >> > Some questions remain open. One is: how to specify how much text to capture in set_surrounding_text, and how often to update?
>
> IMHO the only reason to state it here is that it's more likely that a
> lazy implementation will try to squeeze a full book here, than eg. an
> application setting an insanely long title. But certainly other
> messages across protocols may hit this limit (the long title issue
> wasn't made up :).
>
> As for how much, I think it ultimately depends on the IM behind. Text
> correction probably just wants the current word, any sort of
> prediction will probably require phrases to paragraphs, char
> composition can probably do without. Sounds like this could be some
> sort of hint, but I don't think IMs can tell you today how much text
> do they want...
>
> >> >
> >> > A possible change that I decided against for now is to replace enable/disable events by create/destroy of a new object, which would make more state lifetimes encoded in the protocol.
> >> >
> >> > After reading a blog post on fcitx [0], I got the impression that letting the compositor know some persistent ID of a text edit instance could be useful, however I'm not sure what the use cases are.
> >> >
> >> > As always, I'm happy to hear feedback.
> >> >
> >> > Cheers,
> >> > Dorota Czaplejewicz
> >> >
> >> > [0] https://www.csslayer.info/wordpress/fcitx-dev/gaps-between-wayland-and-fcitx-or-all-input-methods/
> >> >
> >> > Makefile.am | 1 +
> >> > unstable/text-input/text-input-unstable-v3.xml | 362 +++++++++++++++++++++++++
> >> > 2 files changed, 363 insertions(+)
> >> > create mode 100644 unstable/text-input/text-input-unstable-v3.xml
> >> >
> >> > diff --git a/Makefile.am b/Makefile.am
> >> > index 4b9a901..86d7ca9 100644
> >> > --- a/Makefile.am
> >> > +++ b/Makefile.am
> >> > @@ -3,6 +3,7 @@ unstable_protocols = \
> >> > unstable/fullscreen-shell/fullscreen-shell-unstable-v1.xml \
> >> > unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml \
> >> > unstable/text-input/text-input-unstable-v1.xml \
> >> > + unstable/text-input/text-input-unstable-v3.xml \
> >> > unstable/input-method/input-method-unstable-v1.xml \
> >> > unstable/xdg-shell/xdg-shell-unstable-v5.xml \
> >> > unstable/xdg-shell/xdg-shell-unstable-v6.xml \
> >> > diff --git a/unstable/text-input/text-input-unstable-v3.xml b/unstable/text-input/text-input-unstable-v3.xml
> >> > new file mode 100644
> >> > index 0000000..ed5204f
> >> > --- /dev/null
> >> > +++ b/unstable/text-input/text-input-unstable-v3.xml
> >> > @@ -0,0 +1,362 @@
> >> > +<?xml version="1.0" encoding="UTF-8"?>
> >> > +
> >> > +<protocol name="text_input_unstable_v3">
> >> > + <copyright>
> >> > + Copyright © 2012, 2013 Intel Corporation
> >> > + Copyright © 2015, 2016 Jan Arne Petersen
> >> > + Copyright © 2017, 2018 Red Hat, Inc.
> >> > + Copyright © 2018 Purism SPC
> >> > +
> >> > + Permission to use, copy, modify, distribute, and sell this
> >> > + software and its documentation for any purpose is hereby granted
> >> > + without fee, provided that the above copyright notice appear in
> >> > + all copies and that both that copyright notice and this permission
> >> > + notice appear in supporting documentation, and that the name of
> >> > + the copyright holders not be used in advertising or publicity
> >> > + pertaining to distribution of the software without specific,
> >> > + written prior permission. The copyright holders make no
> >> > + representations about the suitability of this software for any
> >> > + purpose. It is provided "as is" without express or implied
> >> > + warranty.
> >> > +
> >> > + THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS
> >> > + SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
> >> > + FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
> >> > + SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> >> > + WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
> >> > + AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
> >> > + ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
> >> > + THIS SOFTWARE.
> >> > + </copyright>
> >> > +
> >> > + <interface name="zwp_text_input_v3" version="1">
> >> > + <description summary="text input">
> >> > + The zwp_text_input_v3 interface represents text input and input methods
> >> > + associated with a seat. It provides enter/leave events to follow the
> >> > + text input focus for a seat.
> >> > +
> >> > + Requests are used to enable/disable the text-input object and set
> >> > + state information like surrounding and selected text or the content type.
> >> > + The information about the entered text is sent to the text-input object
> >> > + via the pre-edit and commit_string events.
> >> > +
> >> > + Text is valid UTF-8 encoded, indices and lengths are in code points. If a
> >> > + grapheme is made up of multiple code points, an index pointing to any of
> >> > + them should be interpreted as pointing to the first one.
> >>
> >> That way we make sure we don't put the cursor/anchor between bytes that
> >> belong to the same UTF-8 encoded Unicode code point which is nice. It
> >> also means that the client has to parse all the UTF-8 encoded strings
> >> into Unicode code points up to the desired cursor/anchor position
> >> on each "preedit_string" event. For each "delete_surrounding_text" event
> >> the client has to parse the UTF-8 sequences before and after the cursor
> >> position up to the requested Unicode code point.
> >>
> >> I feel like we are processing the UTF-8 string already in the
> >> input-method. So I am not sure that we should parse it again on the
> >> client side. Parsing it again would also mean that the client would need
> >> to know about UTF-8 which would be nice to avoid.
> >>
> >> Thoughts?
> >
> > The client needs to know about Unicode, but not necessarily about UTF-8. Specifying code points is actually an advantage here, because byte offsets are inherently expressed relative to UTF-8. By counting with code points, client's internal representation can be UTF-16 or maybe even something else.
>
> I personally think byte offsets are more handy than codepoints:
> pointer math is O(1) and str*() functions are "sensible" (on UTF-8 at
> least, and past the bytes!=chars gotchas), it's relatively simple to
> find out whether you are in the middle of a UTF-8 char, it seems
> simpler to deal with than the other way around if utf16/codepoints are
> used in either side; and this might even be moot as all parties are
> interested in chopping strings between word/char boundaries.
>
> As for using UTF-8 specifically, other protocols do use it for
> exchange of strings (eg. xdg_surface.set_title). It's the perfect fit
> for glib/pango/etc, so it wouldn't be me who objects, either :).
>
> Cheers,
> Carlos
I think you're tipping the scales here. In the interest of having the protocol move forward I'm changing code points to bytes, since I don't think they make a huge difference in practice. v5 incoming!
Cheers,
Dorota
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20180723/cea5f061/attachment.sig>
More information about the wayland-devel
mailing list