Undefined behavior

Fri Aug 15 10:11:59 PDT 2014

On Fri, Aug 15, 2014 at 12:18 AM, Pekka Paalanen <ppaalanen at gmail.com>
wrote:

> On Thu, 14 Aug 2014 11:31:27 -0700
> Jason Ekstrand <jason at jlekstrand.net> wrote:
>
> > On Tue, Aug 12, 2014 at 1:34 PM, Jasper St. Pierre <
> jstpierre at mecheye.net>
> > wrote:
> >
> > > In the xdg-shell thread recently, Pekka had a lot of concerns of the
> kind
> > > "what happens when the client makes an unexpected or illegal request.
> > > Should it be an error?"
> > >
> >
> > Yes and no.  More on that in a line or two.
> >
> >
> > > I have an answer for this: Any behavior not defined is undefined
> behavior.
> > > If it's not explicitly mentioned in the protocol, it's undefined
> behavior.
> > >
> > > Undefined behavior means that anything can happen. The compositor can
> send
> > > you a fatal error. It can mess up your objects in undetectable ways.
> The
> > > surface contents you present might now be garbage. Obviously,
> compositors
> > > should try *very* hard to catch all edge cases and send protocol
> errors,
> > > but it might be difficult or impractical that we can catch all cases.
> > >
> >
> > I'm a little hesitant to above fully.  I would rather reserve the term
> > "undefined behavior" to mean things that may not be what the client
> intends
> > but which are not fatal.  That said, most things shouldn't be this kind
> of
> > undefined behavior;  most client errors should be the kind that
> potentially
> > lead to death.
>
> I'm fully with Jason here. I do want a category of behaviours that are
> not fatal, but also quite likely to not do what you except or what is
> good for the user looking at the screen or using the program. The term
> "undefined behaviour" seems to fit well, no? And we have already used
> the term in the core spec, IIRC.
>
> One example of it is destroying wl_buffer while it is reserved by the
> compositor. It's not a fatal error, but can cause a glitch on screen.
> It's not too different from a client rendering badly, or rendering into
> a wl_buffer that is reserved by the compositor.
>
> > I think the better thing to do would be to clarify the protocol to say
> how
> > it should be used.  Any time a client does something that's outside of
> what
> > the protocol says, it's liable to get killed.  We should be more vigilant
> > when writing protocols to specify how it should properly be used so that
> we
> > aren't left with strange ambiguity.  There a couple of reasons for this.
> >
> > 1) It will make better protocols.  If protocols are written as "this is
> > what you should do" it forces the writer to think about all of the valid
> > ways to use it.  Most likely, those are the only ways of using it that
> will
> > be properly implemented in the compositor anyway, so clients doing
> anything
> > else are liable to run into problems.
> >
> > 2) It's better for backwards-compatibility.  It's easier to define a
> > certain behaviour in a later version if the result in earlier versions
> was
> > "don't do this, or I'll kill your client".  If there were strange,
> > poorly-defined, behaviors before, we have to keep those working in future
> > versions.
> >
> > 3) If there is something that should result in non-fatal undefined
> > behaviour, we want to identify and document it.  This way we make sure
> that
> > compositors do something sensible and, at the very least, don't crash.
> >
> > That said, I think it's better to clean up the protocol docs than to try
> to
> > make some blanket statement about undefined behaviour.  Anywhere weston
> > currently kills the client without it being particularly
> well-documented, I
> > think we can feel free to put that in the "don't do this" category.  But
> > I'd rather see us have those rules written down where they are relevant
> > than have another nebulous statement to help us interpret the other
> > nebulous statements.
>
> Yeah, agreed.
>
> However, the protocol specifications are not only instructions on how to
> use it (from client perspective), there should we also intructions on
> how a compositor should behave when it matters. That is why I would
> encourage explicitly specifying that a protocol error is raised if a
> client does not do what is expected or does what is explicitly defined
> as an error (e.g. specifying a negative width when there is no sense
> in that).
>
> One reason is that error codes are interface-specific, and it is much
> better to raise a protocol error on the object involved, than just a
> wl_display message "You fail!", because we have an async protocol and
> it can be hard to track the error back to the request that caused it.
>
> Another reason is that testing apps between compositors would be a
> little more consistent. I don't expect everyone to test on every
> compositor out there, or even everyone to test even on Weston, before
> releasing an app.
>

Yeah, it would be good if compositors tried to kill apps for bad behavior.
We have to have those error-check paths anyway to keep the compositor from
crashing.  That said, I don't know that we always need to be more specific
than something like XDG_SHELL_USAGE_ERROR.  Specific things should have
specific errors, but enumerating all possible errors with their own error
enum is a pain.  The big thing is that we shouldn't abuse the wl_display
error enum like we have in the past; we should use the error system
properly.

>
> > > As a quick example, we have been discussing "surface roles" in the past
> > > week with xdg_surface. Unfortunately, Weston doesn't properly mark
> cursor
> > > or DND surfaces as having roles, and there's no easy way to check for
> it in
> > > the codebase.
> > >
> > > From my reading of the code, it seems what would happen is that the
> cursor
> > > would disappear as the desktop-shell code changes the surface's
> position,
> > > and on the next mouse move, the cursor would reappear as a
> wl_pointer.enter
> > > event is sent out over the new surface.
> > >
> >
> > Deleting the cursor surface without wl_pointer.set_cursor(null) is
> > something that should probably count as "undefined behavior" in the sense
> > that it would be strange but not fatal.  For instance, the compositor
> could
> > just hang on to the last buffer and keep using it.  There are all sorts
> of
> > ways to gracefully handle that.  However, the client shouldn't expect
> > anything in particular.  It could, for instance, result in no cursor.
> >
> >
> > > Obviously, no valid program should ever try to reuse a cursor surface
> as
> > > an xdg_surface.
> > >
> >
> > Yeah, feel free to kill the client for that.  But we should add a line to
> > wl_pointer that says something like "Once a surface has been used in
> > set_cursor, it is considered a cursor surface and should never be used
> for
> > anything other than set_cursor".
> >
> > Make sense?
>
> Yes, we should definitely add that sentence, because right now there is
> no good point in the protocol that would actually take away the cursor
> role from a surface. I think we should for all cases which don't have
> an explicit clear way for the client to remove the role.
>
> (Hmm, actually there is wl_pointer.leave event...)
>
> But that still leaves open the question: if a role has been removed, is
> it allowed to set a different role?
>
> As I've said, the (Weston) code is cleaner and more symmetric when you
> write it as if you could switch roles, but from the protocol point of
> view it is not useful.
>
> Therefore I think I am willing agree, that changing a wl_surface role
> is an error. We can just add a boolean to weston_surface for "has role
> ever been set", and check it as needed. Or maybe use
> weston_surface::configure != NULL for that, if it does not lead to
> strange cases if the client is still committing on that surface after
> the role is gone.
>
> Why changing roles can never be useful is that all role-specific state
> is lost when the role is lost. That leaves only the wl_surface state,
> and that state is dispensable: it is easy for a client to recreate the
> state on a new wl_surface without a role. Right?
>
> There is a slight chance that making this change retro-actively will
> cause some apps to fail. The bug should be in toolkits in that case. Is
> this a serious concern?
>

I think it's probably ok, but it's possible there are surface-recycling
clients out there.  I just looked at Weston and we don't actually prevent
surface re-use.  What we prevent is a surface being a cursor surface and
another type of surface at the same time.  I'm kind of a fan of changing it
and seeing if anything breaks.  If we have to, we can bump the wl_pointer
version when we make the change.
--Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/wayland-devel/attachments/20140815/619b8c94/attachment-0001.html>