Compositor crashes when switching tty

Pekka Paalanen ppaalanen at gmail.com
Fri May 31 08:42:47 UTC 2019


On Fri, 31 May 2019 04:39:44 +0100
adlo <adloconwy at gmail.com> wrote:

> On Fri, 2019-05-31 at 01:22 +0100, adlo wrote:
> > On Thu, 2019-05-30 at 13:39 +0300, Pekka Paalanen wrote:  
> > > 
> > > Hi,
> > > 
> > > as always, look at the very first problem reported. Other problems
> > > may be fallout from the first one, so fix the first one, and
> > > repeat.
> > > 
> > > It is quite easy to corrupt a list based on struct wl_list, which
> > > will then result in more errors all over the place.
> > >   
> > 
> > The first problem is this:
> > 
> > ==13998== Invalid write of size 8
> > ==13998==    at 0x4884ADB: wl_list_remove (in /usr/lib64/libwayland-
> > server.so.0.1.0)
> > ==13998==    by 0x48A2585: weston_view_set_output (in
> > /usr/lib64/libweston-6.so.0.0.0)
> > ==13998==    by 0x48A41AD: weston_view_unmap (in
> > /usr/lib64/libweston-
> > 6.so.0.0.0)
> > ==13998==    by 0x48A5587: weston_view_destroy (in
> > /usr/lib64/libweston-6.so.0.0.0)
> > ==13998==    by 0x48A5664: weston_surface_destroy (in
> > /usr/lib64/libweston-6.so.0.0.0)
> > ==13998==    by 0x4880927: ??? (in /usr/lib64/libwayland-
> > server.so.0.1.0)
> > ==13998==    by 0x4884A7F: ??? (in /usr/lib64/libwayland-
> > server.so.0.1.0)
> > ==13998==    by 0x4884FC3: ??? (in /usr/lib64/libwayland-
> > server.so.0.1.0)
> > ==13998==    by 0x4880AA1: wl_client_destroy (in
> > /usr/lib64/libwayland-
> > server.so.0.1.0)
> > ==13998==    by 0x4880EDD: wl_display_flush_clients (in
> > /usr/lib64/libwayland-server.so.0.1.0)
> > ==13998==    by 0x4880F17: wl_display_run (in /usr/lib64/libwayland-
> > server.so.0.1.0)
> > ==13998==    by 0x403A57: main (main-wayland.c:625)
> > ==13998==  Address 0x9fcda10 is 96 bytes inside a block of size 120
> > free'd
> > ==13998==    at 0x4839A0C: free (vg_replace_malloc.c:540)
> > ==13998==    by 0x48DD073: ??? (in /usr/lib64/libweston-desktop-
> > 6.so.0.0.0)
> > ==13998==    by 0x48D8E53: ??? (in /usr/lib64/libweston-desktop-
> > 6.so.0.0.0)
> > ==13998==    by 0x4880927: ??? (in /usr/lib64/libwayland-
> > server.so.0.1.0)
> > ==13998==    by 0x4880993: wl_resource_destroy (in
> > /usr/lib64/libwayland-server.so.0.1.0)
> > ==13998==    by 0x5984B27: ffi_call_unix64 (in
> > /usr/lib64/libffi.so.6.0.2)
> > ==13998==    by 0x5984338: ffi_call (in /usr/lib64/libffi.so.6.0.2)
> > ==13998==    by 0x48841B6: ??? (in /usr/lib64/libwayland-
> > server.so.0.1.0)
> > ==13998==    by 0x4880D31: ??? (in /usr/lib64/libwayland-
> > server.so.0.1.0)
> > ==13998==    by 0x4882369: wl_event_loop_dispatch (in
> > /usr/lib64/libwayland-server.so.0.1.0)
> > ==13998==    by 0x4880F24: wl_display_run (in /usr/lib64/libwayland-
> > server.so.0.1.0)
> > ==13998==    by 0x403A57: main (main-wayland.c:625)
> > 
> > However, this doesn't seem to call back into my compositor's code at
> > any point. I see a call to weston_surface_destroy (), which suggests
> > a
> > surface was destroyed. However, if a surface was destroyed, I would
> > expect to see a call to surface_removed () in src/shell.c. How do I
> > interpret this?
> >   
> 
> How do I debug something that isn't even part of my code? It goes
> straight from main to wl_display_run to library code without calling
> any of my callbacks.

Hi,

what likely happens here is that the first Valgrind error already is
just a fallout from an earlier bug. You corrupt a list, free memory,
continue happily, then something else tries to use the list and hits
memory access errors.

This is how you get errors in code that is nowhere near the code you
wrote. You also do not see it in a stack trace, because the bug happens
in one call from the main event loop, and causes problems in another
call from the main event loop.

Often the Valgrind error report can point you to which list is
corrupted. Then you will have to debug the use of that list the hard
way: gdb, add printf's, whatever lets you make sense of it, to see what
list operation is illegal but does not indicate any problems right on
the spot.

Some usual mistakes with wl_list are:
- wl_list_insert() of a 'link' that is already in some list
- forgetting to wl_list_remove() before freeing the item's memory
- removing an item from a list you are iterating through (this has
  several sub-cases though, one that is safe)
- trying to use wl_list_empty() to figure out if wl_list_remove() is
  safe

There is no function that would always be able to tell you if a 'struct
wl_list' variable is initialized or not. You have to design your code
such that you know: either by guaranteed by the code, determined from
another variable, or making sure your variable is always initialized so
that wl_list_remove() is always safe.

Of course, all this is assuming it is the usual kind of list
corruption. It could as well be just some bit of code overwriting
arbitrary memory due to a bug. That is much harder to track down, but
also less common.


Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20190531/d4d6ef54/attachment.sig>


More information about the wayland-devel mailing list