Undefined behavior

Jasper St. Pierre jstpierre at mecheye.net
Sat Aug 16 06:27:35 PDT 2014


It seems people got a bit caught up on the words "undefined behavior" here,
especially as the very similar "undefined surface contents" appears in
wl_surface.destroy / wl_surface.release. Let's try this again, with a
different word.

Any behavior that is not explicitly defined is illegal behavior. A
perfectly built compositor MUST send protocol errors for illegal behavior,
but most compositors are not perfectly built, and it is exceedingly
difficult to enumerate the undefined parts of the spec. Compositors SHOULD
attempt to send protocol errors for illegal behavior.

Clients cannot rely on error conditions being the same forever. New
features and new additions to the protocol may mean that what was illegal
behavior before is now defined.



On Fri, Aug 15, 2014 at 3:18 AM, Pekka Paalanen <ppaalanen at gmail.com> wrote:

> On Thu, 14 Aug 2014 11:31:27 -0700
> Jason Ekstrand <jason at jlekstrand.net> wrote:
>
> > On Tue, Aug 12, 2014 at 1:34 PM, Jasper St. Pierre <
> jstpierre at mecheye.net>
> > wrote:
> >
> > > In the xdg-shell thread recently, Pekka had a lot of concerns of the
> kind
> > > "what happens when the client makes an unexpected or illegal request.
> > > Should it be an error?"
> > >
> >
> > Yes and no.  More on that in a line or two.
> >
> >
> > > I have an answer for this: Any behavior not defined is undefined
> behavior.
> > > If it's not explicitly mentioned in the protocol, it's undefined
> behavior.
> > >
> > > Undefined behavior means that anything can happen. The compositor can
> send
> > > you a fatal error. It can mess up your objects in undetectable ways.
> The
> > > surface contents you present might now be garbage. Obviously,
> compositors
> > > should try *very* hard to catch all edge cases and send protocol
> errors,
> > > but it might be difficult or impractical that we can catch all cases.
> > >
> >
> > I'm a little hesitant to above fully.  I would rather reserve the term
> > "undefined behavior" to mean things that may not be what the client
> intends
> > but which are not fatal.  That said, most things shouldn't be this kind
> of
> > undefined behavior;  most client errors should be the kind that
> potentially
> > lead to death.
>
> I'm fully with Jason here. I do want a category of behaviours that are
> not fatal, but also quite likely to not do what you except or what is
> good for the user looking at the screen or using the program. The term
> "undefined behaviour" seems to fit well, no? And we have already used
> the term in the core spec, IIRC.
>
> One example of it is destroying wl_buffer while it is reserved by the
> compositor. It's not a fatal error, but can cause a glitch on screen.
> It's not too different from a client rendering badly, or rendering into
> a wl_buffer that is reserved by the compositor.
>
> > I think the better thing to do would be to clarify the protocol to say
> how
> > it should be used.  Any time a client does something that's outside of
> what
> > the protocol says, it's liable to get killed.  We should be more vigilant
> > when writing protocols to specify how it should properly be used so that
> we
> > aren't left with strange ambiguity.  There a couple of reasons for this.
> >
> > 1) It will make better protocols.  If protocols are written as "this is
> > what you should do" it forces the writer to think about all of the valid
> > ways to use it.  Most likely, those are the only ways of using it that
> will
> > be properly implemented in the compositor anyway, so clients doing
> anything
> > else are liable to run into problems.
> >
> > 2) It's better for backwards-compatibility.  It's easier to define a
> > certain behaviour in a later version if the result in earlier versions
> was
> > "don't do this, or I'll kill your client".  If there were strange,
> > poorly-defined, behaviors before, we have to keep those working in future
> > versions.
> >
> > 3) If there is something that should result in non-fatal undefined
> > behaviour, we want to identify and document it.  This way we make sure
> that
> > compositors do something sensible and, at the very least, don't crash.
> >
> > That said, I think it's better to clean up the protocol docs than to try
> to
> > make some blanket statement about undefined behaviour.  Anywhere weston
> > currently kills the client without it being particularly
> well-documented, I
> > think we can feel free to put that in the "don't do this" category.  But
> > I'd rather see us have those rules written down where they are relevant
> > than have another nebulous statement to help us interpret the other
> > nebulous statements.
>
> Yeah, agreed.
>
> However, the protocol specifications are not only instructions on how to
> use it (from client perspective), there should we also intructions on
> how a compositor should behave when it matters. That is why I would
> encourage explicitly specifying that a protocol error is raised if a
> client does not do what is expected or does what is explicitly defined
> as an error (e.g. specifying a negative width when there is no sense
> in that).
>
> One reason is that error codes are interface-specific, and it is much
> better to raise a protocol error on the object involved, than just a
> wl_display message "You fail!", because we have an async protocol and
> it can be hard to track the error back to the request that caused it.
>
> Another reason is that testing apps between compositors would be a
> little more consistent. I don't expect everyone to test on every
> compositor out there, or even everyone to test even on Weston, before
> releasing an app.
>
> > > As a quick example, we have been discussing "surface roles" in the past
> > > week with xdg_surface. Unfortunately, Weston doesn't properly mark
> cursor
> > > or DND surfaces as having roles, and there's no easy way to check for
> it in
> > > the codebase.
> > >
> > > From my reading of the code, it seems what would happen is that the
> cursor
> > > would disappear as the desktop-shell code changes the surface's
> position,
> > > and on the next mouse move, the cursor would reappear as a
> wl_pointer.enter
> > > event is sent out over the new surface.
> > >
> >
> > Deleting the cursor surface without wl_pointer.set_cursor(null) is
> > something that should probably count as "undefined behavior" in the sense
> > that it would be strange but not fatal.  For instance, the compositor
> could
> > just hang on to the last buffer and keep using it.  There are all sorts
> of
> > ways to gracefully handle that.  However, the client shouldn't expect
> > anything in particular.  It could, for instance, result in no cursor.
> >
> >
> > > Obviously, no valid program should ever try to reuse a cursor surface
> as
> > > an xdg_surface.
> > >
> >
> > Yeah, feel free to kill the client for that.  But we should add a line to
> > wl_pointer that says something like "Once a surface has been used in
> > set_cursor, it is considered a cursor surface and should never be used
> for
> > anything other than set_cursor".
> >
> > Make sense?
>
> Yes, we should definitely add that sentence, because right now there is
> no good point in the protocol that would actually take away the cursor
> role from a surface. I think we should for all cases which don't have
> an explicit clear way for the client to remove the role.
>
> (Hmm, actually there is wl_pointer.leave event...)
>
> But that still leaves open the question: if a role has been removed, is
> it allowed to set a different role?
>
> As I've said, the (Weston) code is cleaner and more symmetric when you
> write it as if you could switch roles, but from the protocol point of
> view it is not useful.
>
> Therefore I think I am willing agree, that changing a wl_surface role
> is an error. We can just add a boolean to weston_surface for "has role
> ever been set", and check it as needed. Or maybe use
> weston_surface::configure != NULL for that, if it does not lead to
> strange cases if the client is still committing on that surface after
> the role is gone.
>
> Why changing roles can never be useful is that all role-specific state
> is lost when the role is lost. That leaves only the wl_surface state,
> and that state is dispensable: it is easy for a client to recreate the
> state on a new wl_surface without a role. Right?
>
> There is a slight chance that making this change retro-actively will
> cause some apps to fail. The bug should be in toolkits in that case. Is
> this a serious concern?
>
>
> Thanks,
> pq
>



-- 
  Jasper
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/wayland-devel/attachments/20140816/77fc6003/attachment-0001.html>


More information about the wayland-devel mailing list