Undefined behavior

Sat Aug 16 00:51:51 PDT 2014

On Fri, 15 Aug 2014 10:11:59 -0700
Jason Ekstrand <jason at jlekstrand.net> wrote:

> On Fri, Aug 15, 2014 at 12:18 AM, Pekka Paalanen <ppaalanen at gmail.com>
> wrote:
> 
> > On Thu, 14 Aug 2014 11:31:27 -0700
> > Jason Ekstrand <jason at jlekstrand.net> wrote:
> >

> > > I think the better thing to do would be to clarify the protocol to say
> > how
> > > it should be used.  Any time a client does something that's outside of
> > what
> > > the protocol says, it's liable to get killed.  We should be more vigilant
> > > when writing protocols to specify how it should properly be used so that
> > we
> > > aren't left with strange ambiguity.  There a couple of reasons for this.
> > >
> > > 1) It will make better protocols.  If protocols are written as "this is
> > > what you should do" it forces the writer to think about all of the valid
> > > ways to use it.  Most likely, those are the only ways of using it that
> > will
> > > be properly implemented in the compositor anyway, so clients doing
> > anything
> > > else are liable to run into problems.
> > >
> > > 2) It's better for backwards-compatibility.  It's easier to define a
> > > certain behaviour in a later version if the result in earlier versions
> > was
> > > "don't do this, or I'll kill your client".  If there were strange,
> > > poorly-defined, behaviors before, we have to keep those working in future
> > > versions.
> > >
> > > 3) If there is something that should result in non-fatal undefined
> > > behaviour, we want to identify and document it.  This way we make sure
> > that
> > > compositors do something sensible and, at the very least, don't crash.
> > >
> > > That said, I think it's better to clean up the protocol docs than to try
> > to
> > > make some blanket statement about undefined behaviour.  Anywhere weston
> > > currently kills the client without it being particularly
> > well-documented, I
> > > think we can feel free to put that in the "don't do this" category.  But
> > > I'd rather see us have those rules written down where they are relevant
> > > than have another nebulous statement to help us interpret the other
> > > nebulous statements.
> >
> > Yeah, agreed.
> >
> > However, the protocol specifications are not only instructions on how to
> > use it (from client perspective), there should we also intructions on
> > how a compositor should behave when it matters. That is why I would
> > encourage explicitly specifying that a protocol error is raised if a
> > client does not do what is expected or does what is explicitly defined
> > as an error (e.g. specifying a negative width when there is no sense
> > in that).
> >
> > One reason is that error codes are interface-specific, and it is much
> > better to raise a protocol error on the object involved, than just a
> > wl_display message "You fail!", because we have an async protocol and
> > it can be hard to track the error back to the request that caused it.
> >
> > Another reason is that testing apps between compositors would be a
> > little more consistent. I don't expect everyone to test on every
> > compositor out there, or even everyone to test even on Weston, before
> > releasing an app.
> >
> 
> Yeah, it would be good if compositors tried to kill apps for bad behavior.
> We have to have those error-check paths anyway to keep the compositor from
> crashing.  That said, I don't know that we always need to be more specific
> than something like XDG_SHELL_USAGE_ERROR.  Specific things should have
> specific errors, but enumerating all possible errors with their own error
> enum is a pain.  The big thing is that we shouldn't abuse the wl_display
> error enum like we have in the past; we should use the error system
> properly.

Exactly. In your example, it should be XDG_SHELL_ERROR,
XDG_SURFACE_ERROR and XDG_POPUP_ERROR, (at least) one per
interface, so we can send the most specific protocol object with
the error. We also have an arbitrary string to go with an error, so
that can tell more, no need to define enums for every single case...

Except for one reason: testing. If you can imagine that testing
would benefit from an explicit error code, I totally recommend
adding one. Comparing error strings in tests is something I would
avoid.

> > > Yeah, feel free to kill the client for that.  But we should add a line to
> > > wl_pointer that says something like "Once a surface has been used in
> > > set_cursor, it is considered a cursor surface and should never be used
> > for
> > > anything other than set_cursor".
> > >
> > > Make sense?
> >
> > Yes, we should definitely add that sentence, because right now there is
> > no good point in the protocol that would actually take away the cursor
> > role from a surface. I think we should for all cases which don't have
> > an explicit clear way for the client to remove the role.
> >
> > (Hmm, actually there is wl_pointer.leave event...)
> >
> > But that still leaves open the question: if a role has been removed, is
> > it allowed to set a different role?
> >
> > As I've said, the (Weston) code is cleaner and more symmetric when you
> > write it as if you could switch roles, but from the protocol point of
> > view it is not useful.
> >
> > Therefore I think I am willing agree, that changing a wl_surface role
> > is an error. We can just add a boolean to weston_surface for "has role
> > ever been set", and check it as needed. Or maybe use
> > weston_surface::configure != NULL for that, if it does not lead to
> > strange cases if the client is still committing on that surface after
> > the role is gone.
> >
> > Why changing roles can never be useful is that all role-specific state
> > is lost when the role is lost. That leaves only the wl_surface state,
> > and that state is dispensable: it is easy for a client to recreate the
> > state on a new wl_surface without a role. Right?
> >
> > There is a slight chance that making this change retro-actively will
> > cause some apps to fail. The bug should be in toolkits in that case. Is
> > this a serious concern?
> >
> 
> I think it's probably ok, but it's possible there are surface-recycling
> clients out there.  I just looked at Weston and we don't actually prevent
> surface re-use.  What we prevent is a surface being a cursor surface and
> another type of surface at the same time.  I'm kind of a fan of changing it
> and seeing if anything breaks.  If we have to, we can bump the wl_pointer
> version when we make the change.

I didn't mean only wl_pointer, but everything. Cursors, drag icons,
sub-surfaces, wl_shell surfaces...

It would be quite of a pain to make it version dependent.

How should we try this? First change Weston and wait for some time
to see if anyone complains, and then fix the protocol spec? On what
timescale? Or just do it all in one go?

Good thing this is a backwards compatible change in the sense, that
if a client works with the new rules, it also works with the old
ones.

Thanks,
pq