Wayland triggering automated bug reports mechanism? (Re: [RFC wayland] wayland-server: assert instead of posting events with wrong client objects)

Fri Dec 9 08:52:20 UTC 2016

On Thu, 8 Dec 2016 16:01:55 -0600
Derek Foreman <derekf at osg.samsung.com> wrote:

> On 08/12/16 02:54 PM, Pekka Paalanen wrote:
> > On Thu, 8 Dec 2016 13:28:41 -0600
> > Derek Foreman <derekf at osg.samsung.com> wrote:
> >  
> >> On 02/12/16 02:07 AM, Pekka Paalanen wrote:  

> >>> IMO, your check should not be an assert, it should be an unconditional
> >>> abort() with no way to disable it and a clear log message. Like a
> >>> segfault is, only better.
> >>>
> >>> A compositor that handles SEGVs and ABRTs will then also provide a
> >>> backtrace that should be very helpful in squashing the bug.
> >>>
> >>> However, it seems our existing convention is to just kill the client on
> >>> compositor programming errors (e.g. wl_closure_marshal() failing on
> >>> nullable violation). I don't really like that, but it's a precedent and
> >>> the very least we can detect the problem and yell rather than have
> >>> clients mysteriously fail.  
> >>
> >> I think it would be trivial to catch nullable violation from the
> >> compositor in the same place where I'm testing for accidental mixing of
> >> client objects.
> >>
> >> I don't think we should remove any client side sanity checks, but adding
> >> that check on the compositor side seems like a win for debugging to me.
> >> Should I do this too?  
> >
> > Sure. The nullable violation is already checked, it might only miss
> > a good log message. Server makes a mistake, the client is
> > disconnected when the error is detected...  
> 
> Ah, ok, that's tested in wl_closure_marshal - so it tests for nullable 
> violation in both client and server in the same place.
> 
> The log message is already concise, IMHO.
> 
> I am left wondering if I should have a client object mix-up react in the 
> same way - log something and disconnect the client.

I suspect the idea has been something like this:

- if a client makes a programming error, just abort()

- if a server makes a programming error, yell and try to recover by
  disconnecting the client if possible

The rationale for the latter is that if there is a rarely hit
programming error in the server, users would be very annoyed for the
whole desktop vanishing randomly. It's a trade-off between making
people report bugs vs. making the server "stable".

Is there some interface we could offer or support in libwayland
that could connect to automated bug reporting systems used by
distributions? Using SIGABRT for that is a bit harsh, I agree.

Or would *any* message through the libwayland log facilities count as a
reportable bug? That seems a bit... much.

> I'm surprised that dup failure and unhandled arg type abort, but 
> nullable violation, too many args, and malloc failure keep marching on.
> 
> Were these prioritized this way intentionally?

I don't think so. :-)

> >>> libwayland-client does call wl_abort() on such failures.

Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 801 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20161209/126183d4/attachment.sig>