Protocol backwards compatibility requirements?

Peter Hutterer peter.hutterer at who-t.net
Tue Apr 21 05:01:14 UTC 2020


On Mon, Apr 20, 2020 at 03:05:32PM +0300, Pekka Paalanen wrote:
> On Thu, 16 Apr 2020 17:47:56 +1000
> Christopher James Halse Rogers <chris at cooperteam.net> wrote:
> 
> > On Wed, Apr 15, 2020 at 14:27, Simon Ser <contact at emersion.fr> wrote:
> > > Hi,
> > > 
> > > On Monday, April 13, 2020 1:59 AM, Peter Hutterer 
> > > <peter.hutterer at who-t.net> wrote:  
> > >>  Hi all,
> > >> 
> > >>  This is request for comments on the exact requirements for protocol
> > >>  backwards compatibility for clients binding to new versions of an 
> > >> interface.
> > >>  Reason for this are the high-resolution wheel scrolling patches:
> > >>  https://gitlab.freedesktop.org/wayland/wayland/-/merge_requests/72
> > >> 
> > >>  Specifically, the question is: do we **change** protocol elements or
> > >>  behaviour as the interface versions increase? A few random examples:  
> > > 
> > > What we can't do is:
> > > 
> > > - Change existing messages' signature
> > > - Completely remove a message  
> 
> Indeed.
> 
> > 
> > It should be relatively easy to modify wayland-scanner to support both 
> > of these things, *if* we decide that it's a reasonable thing to do. 
> > (You'd do something like add support for <request name="foo" 
> > removed_in="5"/> and the like)  
> 
> How would that work, given the version is negotiated at runtime?
> 
> The message signature structs are now ABI as well, and we have no room
> for alternate signatures, do we?
> 
> > >   
> > >>  - event wl_foo.bar introduced in version N sends a wl_fixed in
> > >>    surface coordinates. version N+1 changes this to a normalized
> > >>    [-10000, +10000] range.  
> > > 
> > > Argument types can't be changed. This would be a breaking change for 
> > > the
> > > generated code, we can't do that.  
> > 
> > But this isn't changing the argument type; it's changing the 
> > interpretation of the argument.
> > In both cases the type is wl_fixed; in the first you interpret this 
> > wl_fixed as being in surface coordinates, in the second you interpret 
> > it differently.
> > 
> > This doesn't require any changes to code generation; I don't think this 
> > is (in principle) any more disruptive than changing “wl_foo.baz is 
> > sent exactly once” to “wl_foo.baz is sent zero or more times”, 
> > which you're happy with.
> 
> Something we rarely consider is if you pass Wayland protocol objects
> into a library without negotiating the object version with the library
> first. For example, we pass wl_surface into the EGL Wayland wrapper
> library. If wl_surface would get a version bump breaking backwards
> compatibility, meaning that version N+1 changes something that existed
> in version N, the library handling only version N would fall apart.

ftr, this seems like something that should be noted in the protocol's xml
directly to avoid future accidents.

> I sincerely hope this is the only case of a library taking a ready-made
> Wayland object in. Getting the version negotiation right needs
> inconvenient additions to the library API that I don't think many would
> bother or even realize it's needed.
> 
> You can query the version of a wl_proxy, sure, but that does not help
> you if it returns a number larger than what your code knows about.
> 
> Btw. this is also a problem in the opposite direction. Let's say you
> use a toolkit and the toolkit allows you access to the Wayland protocol
> objects. Then the toolkit gains support for new interface versions and
> uses them, but your app code is not updated. If the protocol change is
> backwards incompatible, your app code may break even if only behaviour
> changes and not signatures.
> 
> > >>  - request wl_foo.bar introduced in version N takes an int. version 
> > >> N+1
> > >>    changes wl_foo.bar to take a wl_fixed and an enum.  
> > > 
> > > Ditto.
> > >   
> > >>  - request wl_foo.bar introduced in version N guaranteed to generate 
> > >> a single
> > >>    event wl_foo.baz. if the client binds to version N+1 that event 
> > >> may be
> > >>    sent zero, one or multiple times.  
> > > 
> > > This is fine.
> > >   
> > >>  I think these examples cover a wide-enough range of the possible 
> > >> changes.
> > >> 
> > >>  My assumption was that we only ever add new requests/events but 
> > >> never change
> > >>  existing behaviour. So wl_foo.bar introduced in version N will 
> > >> always have
> > >>  the same behaviour for any interface N+m.  
> > > 
> > > We can change existing requests' behaviour. This has already been 
> > > done a
> > > number of times, see e.g. wl_data_offer.accept or 
> > > xdg_output.description.
> > > 
> > > Clients should always have a max-version, ie. they should never 
> > > blindly bind
> > > to the compositor's version.
> > > 
> > > What is also fine is marking a message as "deprecated from version 
> > > N". Such a
> > > message wouldn't be sent anymore starting from this version.
> > >   
> > >>  I've seen some pushback for above linked patchset because it gets
> > >>  complicated and suggestions to just change the current interface.
> > >>  The obvious advantage is being able to clean up any mess in the 
> > >> protocol.
> > >> 
> > >>  The disadvantages are the breakage of backwards compatibility with 
> > >> older
> > >>  versions. You're effectively forcing every compositor/client to 
> > >> change the
> > >>  code based on the version number, even where it's not actually 
> > >> needed. Or,
> > >>  IOW, a client may want a new feature in N+2 but now needs to 
> > >> implement all
> > >>  changes from N+1 since they may change the behaviour significantly.  
> > 
> > This is the meat of the question - all of the changes described are 
> > technically fairly simple to implement.
> 
> Breaking stuff is simple, sure. Or what do you mean?
> 
> > To some extent this is a question of self-limitations. As has been 
> > mentioned, protocols have *already* been deliberately broken in this 
> > way, and people are happy enough with that. As long as we're mindful of 
> > the cost such changes impose, I think that having the technical 
> > capability to make such changes is of benefit - for example, rather 
> > than marking a message as “deprecated from version N” I think it 
> > would be preferable to just not have the message in the listener 
> > struct. (Note that I'm not volunteering to *implement* that capability, 
> > and there are probably more valuable things to work on, but if it 
> > magically appeared without any effort it'd be nice to have that 
> > capability).
> 
> We cannot do this.
> 
> The simple reason is that the protocol object version is negotiated at
> runtime. The code must always be generated for all versions from 1 up
> to max version wanted. It is always possible that the program on the
> other end of the Wayland connection implements only version 1.
> 
> > The status quo is that we're happy (perhaps accidentally) with 
> > requiring a client to implement all changes from N+1 in order to get 
> > something from N+2. I think whether or not that's ok is a case-by-case 
> > decision. How difficult is it for clients to implement N+1? How much 
> > simpler does the break make protocol version N+1? If it's trivial for 
> > clients to handle and makes the protocol significantly simpler, I think 
> > it's obvious that we *should* make the break; likewise, if it's likely 
> > to be difficult for clients to handle and doesn't make N+1 much 
> > simpler, it's obvious that we *shouldn't*.
> 
> Likewise it is not possible to cherry-pick features from version N+2
> without also implementing version N+1 fully, because at runtime the
> negotiation may end up with version N+1.

Note that there are different grades of "implementing N+1 fully".
Right now I can claim to support wl_seat version 7 and have axis_discrete
and axis_source as noops and everything will still work because
wl_pointer.axis never changed. So while my client may support the version at
the protocol level, it doesn't do anything with it. And that's fine if it
doesn't need the new data.

Once we change the behaviour of existing events that is not true anymore.

> > For the specific case at hand, it doesn't seem like it would be 
> > particularly difficult for clients to handle axis events changing 
> > meaning in version 8, and it looks like the protocol would be 
> > substantially simpler without the interaction between axis_v120, axis, 
> > and axis_discrete.
> 
> Since we talking about wl_pointer specifically, let me remind us about
> the interface hierarchy:
> 
> - wl_seat
>   - wl_pointer
>   - wl_touch
>   - wl_keyboard
> 
> Wayland uses inheritance to determine protocol object versions, when an
> explicit version is not provided. The only relevant interface here that
> can create objects with an explicit version number is wl_registry.bind.
> This can be used to set the wl_seat version only. Then wl_pointer,
> wl_touch, wl_keyboard all get their version from the wl_seat object.
> 
> If you want to have wl_pointer version N+2 and wl_touch version N+1,
> you have to create two different wl_seat objects for the same wl_seat
> with different version numbers. Most clients do not do this, though.
> It's simplest to just have one wl_seat object negotiated with the
> highest possible version.
> 
> So the requirement to implement all earlier versions is not even
> limited to the interface itself, it applies to the whole interface tree
> starting from the global.
> 
> If you decide that wl_pointer version < N+2 is unsupported, you do so
> for wl_seat, wl_touch, and wl_keyboard as well.
> 
> It is possible to simplify the messaging sequence for an interface tree
> by saying that starting from version N, things behave differently. But,
> then you need two implementations in both servers and clients: one
> for < N and one for >= N.
> 
> If you're asking if the implementation for version < N could be
> deleted or avoided, then I'd say no. Definitely no for desktop
> compositors, probably no for anything else public.

The sub-interfaces are inseparable from the seats, that's set in stone. The
question here is less about mixing versions within the seat but more
about skipping versions without harm. Let's say wl_pointer version 9
introduces wl_pointer.pressure, something independent of anything else
else in the wl_pointer interface. Qt (iirc) never implemented axis_discrete
handling but it let's say it wants support for pressure.

Guaranteed backwards compatibility means Qt can bump to version 9, implement
noop functions for axis_source and axis_discrete and done. 

Allowing events to change between versions means that Qt now also needs to
update its handling of whatever changed between those versions, e.g.
wl_pointer.axis. A direct jump past versions you don't care about isn't
possible.

Also, having written the patches to change wl_pointer.axis_discrete to a 120
base value there's another issue: no auto-generated FOO_SINCE_VERSION
because this doesn't show up in the protocol itself. So this really flies
under the radar and you just have to know about it by reading the
documentation.

Cheers,
   Peter


More information about the wayland-devel mailing list