Undefined behavior

Fri Aug 15 00:18:45 PDT 2014

On Thu, 14 Aug 2014 11:31:27 -0700
Jason Ekstrand <jason at jlekstrand.net> wrote:

> On Tue, Aug 12, 2014 at 1:34 PM, Jasper St. Pierre <jstpierre at mecheye.net>
> wrote:
> 
> > In the xdg-shell thread recently, Pekka had a lot of concerns of the kind
> > "what happens when the client makes an unexpected or illegal request.
> > Should it be an error?"
> >
> 
> Yes and no.  More on that in a line or two.
> 
> 
> > I have an answer for this: Any behavior not defined is undefined behavior.
> > If it's not explicitly mentioned in the protocol, it's undefined behavior.
> >
> > Undefined behavior means that anything can happen. The compositor can send
> > you a fatal error. It can mess up your objects in undetectable ways. The
> > surface contents you present might now be garbage. Obviously, compositors
> > should try *very* hard to catch all edge cases and send protocol errors,
> > but it might be difficult or impractical that we can catch all cases.
> >
> 
> I'm a little hesitant to above fully.  I would rather reserve the term
> "undefined behavior" to mean things that may not be what the client intends
> but which are not fatal.  That said, most things shouldn't be this kind of
> undefined behavior;  most client errors should be the kind that potentially
> lead to death.

I'm fully with Jason here. I do want a category of behaviours that are
not fatal, but also quite likely to not do what you except or what is
good for the user looking at the screen or using the program. The term
"undefined behaviour" seems to fit well, no? And we have already used
the term in the core spec, IIRC.

One example of it is destroying wl_buffer while it is reserved by the
compositor. It's not a fatal error, but can cause a glitch on screen.
It's not too different from a client rendering badly, or rendering into
a wl_buffer that is reserved by the compositor.

> I think the better thing to do would be to clarify the protocol to say how
> it should be used.  Any time a client does something that's outside of what
> the protocol says, it's liable to get killed.  We should be more vigilant
> when writing protocols to specify how it should properly be used so that we
> aren't left with strange ambiguity.  There a couple of reasons for this.
> 
> 1) It will make better protocols.  If protocols are written as "this is
> what you should do" it forces the writer to think about all of the valid
> ways to use it.  Most likely, those are the only ways of using it that will
> be properly implemented in the compositor anyway, so clients doing anything
> else are liable to run into problems.
> 
> 2) It's better for backwards-compatibility.  It's easier to define a
> certain behaviour in a later version if the result in earlier versions was
> "don't do this, or I'll kill your client".  If there were strange,
> poorly-defined, behaviors before, we have to keep those working in future
> versions.
> 
> 3) If there is something that should result in non-fatal undefined
> behaviour, we want to identify and document it.  This way we make sure that
> compositors do something sensible and, at the very least, don't crash.
> 
> That said, I think it's better to clean up the protocol docs than to try to
> make some blanket statement about undefined behaviour.  Anywhere weston
> currently kills the client without it being particularly well-documented, I
> think we can feel free to put that in the "don't do this" category.  But
> I'd rather see us have those rules written down where they are relevant
> than have another nebulous statement to help us interpret the other
> nebulous statements.

Yeah, agreed.

However, the protocol specifications are not only instructions on how to
use it (from client perspective), there should we also intructions on
how a compositor should behave when it matters. That is why I would
encourage explicitly specifying that a protocol error is raised if a
client does not do what is expected or does what is explicitly defined
as an error (e.g. specifying a negative width when there is no sense
in that).

One reason is that error codes are interface-specific, and it is much
better to raise a protocol error on the object involved, than just a
wl_display message "You fail!", because we have an async protocol and
it can be hard to track the error back to the request that caused it.

Another reason is that testing apps between compositors would be a
little more consistent. I don't expect everyone to test on every
compositor out there, or even everyone to test even on Weston, before
releasing an app.

> > As a quick example, we have been discussing "surface roles" in the past
> > week with xdg_surface. Unfortunately, Weston doesn't properly mark cursor
> > or DND surfaces as having roles, and there's no easy way to check for it in
> > the codebase.
> >
> > From my reading of the code, it seems what would happen is that the cursor
> > would disappear as the desktop-shell code changes the surface's position,
> > and on the next mouse move, the cursor would reappear as a wl_pointer.enter
> > event is sent out over the new surface.
> >
> 
> Deleting the cursor surface without wl_pointer.set_cursor(null) is
> something that should probably count as "undefined behavior" in the sense
> that it would be strange but not fatal.  For instance, the compositor could
> just hang on to the last buffer and keep using it.  There are all sorts of
> ways to gracefully handle that.  However, the client shouldn't expect
> anything in particular.  It could, for instance, result in no cursor.
> 
> 
> > Obviously, no valid program should ever try to reuse a cursor surface as
> > an xdg_surface.
> >
> 
> Yeah, feel free to kill the client for that.  But we should add a line to
> wl_pointer that says something like "Once a surface has been used in
> set_cursor, it is considered a cursor surface and should never be used for
> anything other than set_cursor".
> 
> Make sense?

Yes, we should definitely add that sentence, because right now there is
no good point in the protocol that would actually take away the cursor
role from a surface. I think we should for all cases which don't have
an explicit clear way for the client to remove the role.

(Hmm, actually there is wl_pointer.leave event...)

But that still leaves open the question: if a role has been removed, is
it allowed to set a different role?

As I've said, the (Weston) code is cleaner and more symmetric when you
write it as if you could switch roles, but from the protocol point of
view it is not useful.

Therefore I think I am willing agree, that changing a wl_surface role
is an error. We can just add a boolean to weston_surface for "has role
ever been set", and check it as needed. Or maybe use
weston_surface::configure != NULL for that, if it does not lead to
strange cases if the client is still committing on that surface after
the role is gone.

Why changing roles can never be useful is that all role-specific state
is lost when the role is lost. That leaves only the wl_surface state,
and that state is dispensable: it is easy for a client to recreate the
state on a new wl_surface without a role. Right?

There is a slight chance that making this change retro-actively will
cause some apps to fail. The bug should be in toolkits in that case. Is
this a serious concern?

Thanks,
pq