Protocol backwards compatibility requirements?

Tue Apr 21 00:57:34 UTC 2020

On Mon, Apr 20, 2020 at 15:05, Pekka Paalanen <ppaalanen at gmail.com> 
wrote:
> On Thu, 16 Apr 2020 17:47:56 +1000
> Christopher James Halse Rogers <chris at cooperteam.net> wrote:
> 
>>  On Wed, Apr 15, 2020 at 14:27, Simon Ser <contact at emersion.fr> 
>> wrote:
>>  > Hi,
>>  >
>>  > On Monday, April 13, 2020 1:59 AM, Peter Hutterer
>>  > <peter.hutterer at who-t.net> wrote:
>>  >>  Hi all,
>>  >>
>>  >>  This is request for comments on the exact requirements for 
>> protocol
>>  >>  backwards compatibility for clients binding to new versions of 
>> an
>>  >> interface.
>>  >>  Reason for this are the high-resolution wheel scrolling patches:
>>  >>  
>> https://gitlab.freedesktop.org/wayland/wayland/-/merge_requests/72
>>  >>
>>  >>  Specifically, the question is: do we **change** protocol 
>> elements or
>>  >>  behaviour as the interface versions increase? A few random 
>> examples:
>>  >
>>  > What we can't do is:
>>  >
>>  > - Change existing messages' signature
>>  > - Completely remove a message
> 
> Indeed.
> 
>> 
>>  It should be relatively easy to modify wayland-scanner to support 
>> both
>>  of these things, *if* we decide that it's a reasonable thing to do.
>>  (You'd do something like add support for <request name="foo"
>>  removed_in="5"/> and the like)
> 
> How would that work, given the version is negotiated at runtime?
> 
> The message signature structs are now ABI as well, and we have no room
> for alternate signatures, do we?

Sure we do. Internally we can just give them different names, with 
different contents, and switch based on the version requested at 
runtime.

 From the client API side it's more difficult (at least for requests), 
because we can't remove any symbols - we *can* make it a client error 
with a good error message, though.

On the events side it's easier, as we can add a wl_foo_listener_v5 
struct and wl_foo_add_listener_v5.

This does add a new sharp edge to the raw wl_proxy_* interface, but 
client code isn't expected to be using that and this doesn't seem 
particularly hard for language bindings to adapt to.

> 
>>  >
>>  >>  - event wl_foo.bar introduced in version N sends a wl_fixed in
>>  >>    surface coordinates. version N+1 changes this to a normalized
>>  >>    [-10000, +10000] range.
>>  >
>>  > Argument types can't be changed. This would be a breaking change 
>> for
>>  > the
>>  > generated code, we can't do that.
>> 
>>  But this isn't changing the argument type; it's changing the
>>  interpretation of the argument.
>>  In both cases the type is wl_fixed; in the first you interpret this
>>  wl_fixed as being in surface coordinates, in the second you 
>> interpret
>>  it differently.
>> 
>>  This doesn't require any changes to code generation; I don't think 
>> this
>>  is (in principle) any more disruptive than changing “wl_foo.baz is
>>  sent exactly once” to “wl_foo.baz is sent zero or more times”,
>>  which you're happy with.
> 
> Something we rarely consider is if you pass Wayland protocol objects
> into a library without negotiating the object version with the library
> first. For example, we pass wl_surface into the EGL Wayland wrapper
> library. If wl_surface would get a version bump breaking backwards
> compatibility, meaning that version N+1 changes something that existed
> in version N, the library handling only version N would fall apart.
> 
> I sincerely hope this is the only case of a library taking a 
> ready-made
> Wayland object in. Getting the version negotiation right needs
> inconvenient additions to the library API that I don't think many 
> would
> bother or even realize it's needed.
> 
> You can query the version of a wl_proxy, sure, but that does not help
> you if it returns a number larger than what your code knows about.
> 
> Btw. this is also a problem in the opposite direction. Let's say you
> use a toolkit and the toolkit allows you access to the Wayland 
> protocol
> objects. Then the toolkit gains support for new interface versions and
> uses them, but your app code is not updated. If the protocol change is
> backwards incompatible, your app code may break even if only behaviour
> changes and not signatures.

This is an additional cost that should be considered for types that may 
be transferred across library boundaries like this; we should also try 
to make it clear to toolkits that this is a fraught API.

> 
>>  >>  - request wl_foo.bar introduced in version N takes an int. 
>> version
>>  >> N+1
>>  >>    changes wl_foo.bar to take a wl_fixed and an enum.
>>  >
>>  > Ditto.
>>  >
>>  >>  - request wl_foo.bar introduced in version N guaranteed to 
>> generate
>>  >> a single
>>  >>    event wl_foo.baz. if the client binds to version N+1 that 
>> event
>>  >> may be
>>  >>    sent zero, one or multiple times.
>>  >
>>  > This is fine.
>>  >
>>  >>  I think these examples cover a wide-enough range of the possible
>>  >> changes.
>>  >>
>>  >>  My assumption was that we only ever add new requests/events but
>>  >> never change
>>  >>  existing behaviour. So wl_foo.bar introduced in version N will
>>  >> always have
>>  >>  the same behaviour for any interface N+m.
>>  >
>>  > We can change existing requests' behaviour. This has already been
>>  > done a
>>  > number of times, see e.g. wl_data_offer.accept or
>>  > xdg_output.description.
>>  >
>>  > Clients should always have a max-version, ie. they should never
>>  > blindly bind
>>  > to the compositor's version.
>>  >
>>  > What is also fine is marking a message as "deprecated from version
>>  > N". Such a
>>  > message wouldn't be sent anymore starting from this version.
>>  >
>>  >>  I've seen some pushback for above linked patchset because it 
>> gets
>>  >>  complicated and suggestions to just change the current 
>> interface.
>>  >>  The obvious advantage is being able to clean up any mess in the
>>  >> protocol.
>>  >>
>>  >>  The disadvantages are the breakage of backwards compatibility 
>> with
>>  >> older
>>  >>  versions. You're effectively forcing every compositor/client to
>>  >> change the
>>  >>  code based on the version number, even where it's not actually
>>  >> needed. Or,
>>  >>  IOW, a client may want a new feature in N+2 but now needs to
>>  >> implement all
>>  >>  changes from N+1 since they may change the behaviour 
>> significantly.
>> 
>>  This is the meat of the question - all of the changes described are
>>  technically fairly simple to implement.
> 
> Breaking stuff is simple, sure. Or what do you mean?

Making breaking changes to version N+1 of a protocol in a way that 
preserves the ability for version 1…N clients to continue to function 
unchanged is technically fairly simple. The question is when we 
*should*, and how much effort we should invest in capitalising on such 
changes.

> 
>>  To some extent this is a question of self-limitations. As has been
>>  mentioned, protocols have *already* been deliberately broken in this
>>  way, and people are happy enough with that. As long as we're 
>> mindful of
>>  the cost such changes impose, I think that having the technical
>>  capability to make such changes is of benefit - for example, rather
>>  than marking a message as “deprecated from version N” I think it
>>  would be preferable to just not have the message in the listener
>>  struct. (Note that I'm not volunteering to *implement* that 
>> capability,
>>  and there are probably more valuable things to work on, but if it
>>  magically appeared without any effort it'd be nice to have that
>>  capability).
> 
> We cannot do this.
> 
> The simple reason is that the protocol object version is negotiated at
> runtime. The code must always be generated for all versions from 1 up
> to max version wanted. It is always possible that the program on the
> other end of the Wayland connection implements only version 1.

As far as I can tell there's nothing *stopping* us from having 
different listener structs for different versions. That preserves the 
ability to select a lower version than the maximum at runtime, while 
allowing version n+1 events to not be a superset of version n events.

Likewise for requests, but that's less interesting because we can't 
remove the relevant entrypoints.

> 
>>  The status quo is that we're happy (perhaps accidentally) with
>>  requiring a client to implement all changes from N+1 in order to get
>>  something from N+2. I think whether or not that's ok is a 
>> case-by-case
>>  decision. How difficult is it for clients to implement N+1? How much
>>  simpler does the break make protocol version N+1? If it's trivial 
>> for
>>  clients to handle and makes the protocol significantly simpler, I 
>> think
>>  it's obvious that we *should* make the break; likewise, if it's 
>> likely
>>  to be difficult for clients to handle and doesn't make N+1 much
>>  simpler, it's obvious that we *shouldn't*.
> 
> Likewise it is not possible to cherry-pick features from version N+2
> without also implementing version N+1 fully, because at runtime the
> negotiation may end up with version N+1.

I don't think that's actually true, though? If we had a protocol where 
a client could handle version N or version N+2 but *not* version N+1 
and the compositor advertises version N+1 then the client could simply 
bind version N and go about its business.

But I think we might be talking about different things here? This is 
not about a compositor *not supporting* a version of a protocol.

> 
>>  For the specific case at hand, it doesn't seem like it would be
>>  particularly difficult for clients to handle axis events changing
>>  meaning in version 8, and it looks like the protocol would be
>>  substantially simpler without the interaction between axis_v120, 
>> axis,
>>  and axis_discrete.
> 
> Since we talking about wl_pointer specifically, let me remind us about
> the interface hierarchy:
> 
> - wl_seat
>   - wl_pointer
>   - wl_touch
>   - wl_keyboard
> 
> Wayland uses inheritance to determine protocol object versions, when 
> an
> explicit version is not provided. The only relevant interface here 
> that
> can create objects with an explicit version number is 
> wl_registry.bind.
> This can be used to set the wl_seat version only. Then wl_pointer,
> wl_touch, wl_keyboard all get their version from the wl_seat object.
> 
> If you want to have wl_pointer version N+2 and wl_touch version N+1,
> you have to create two different wl_seat objects for the same wl_seat
> with different version numbers. Most clients do not do this, though.
> It's simplest to just have one wl_seat object negotiated with the
> highest possible version.
> 
> So the requirement to implement all earlier versions is not even
> limited to the interface itself, it applies to the whole interface 
> tree
> starting from the global.
> 
> If you decide that wl_pointer version < N+2 is unsupported, you do so
> for wl_seat, wl_touch, and wl_keyboard as well.
> 
> It is possible to simplify the messaging sequence for an interface 
> tree
> by saying that starting from version N, things behave differently. 
> But,
> then you need two implementations in both servers and clients: one
> for < N and one for >= N.
> 

Indeed. But, again, this is the status quo. By my understanding there 
exist current protocol versions where this applies.

> If you're asking if the implementation for version < N could be
> deleted or avoided, then I'd say no. Definitely no for desktop
> compositors, probably no for anything else public.

It might be nice to have some mechanism to do this *eventually*, but 
yeah. That's not what this is about.