new test hangs in test-compositor.c at waitid - any clues?

Fri Oct 6 16:22:48 UTC 2023

I have a new test thatt is supposed to encounter an error in
the server, causing the server to abort the client and end the test.
The client is at that point in a sleep waiting to be aborted.

Instead, the test hangs (and eventually times out).

If I run it under gdb, and Ctrl-C break during the hang, I get:

(gdb) bt
#0  0x00007ffff7e72ac6 in __waitid (idtype=P_PID, id=10135,
infop=0x7fffffffdd70, options=4)
    at ../sysdeps/unix/sysv/linux/waitid.c:29
#1  0x000055555555de10 in handle_client_destroy (data=0x555555567730)
    at ../tests/test-compositor.c:110
#2  0x00007ffff7fa20fe in wl_event_loop_dispatch_idle
(loop=0x555555567440)
    at ../src/event-loop.c:969
#3  0x00007ffff7fa256c in wl_event_loop_dispatch (loop=0x555555567440,
timeout=-1)
    at ../src/event-loop.c:1109
#4  0x00007ffff7f9ea81 in wl_display_run (display=0x555555567350)
    at ../src/wayland-server.c:1493
#5  0x000055555555e814 in display_run (d=0x555555567300) at
../tests/test-compositor.c:401
#6  0x000055555555cc36 in server_needs_zombies () at
../tests/display-test.c:1884
#7  0x000055555555cf80 in run_test (t=0x5555555666e0
<testserver_needs_zombies>)
    at ../tests/test-runner.c:159
#8  0x000055555555d559 in main (argc=2, argv=0x7fffffffe328) at
../tests/test-runner.c:345

[server_needs_zombies is the name of the new test, which I'm using to
establish that the server needs zombie resources like the client
needs zombie proxies]

Using 'ps xf' I can see that the child client was not a zombie (in the
linux process sense this time, not the wayland object sense) until the
Ctrl-C in gdb, and then immediately becomes a zombie at the Ctrl-C.
Continuing in gdb allows the test to terminate with the expected error
result:

Continuing.
Client 'snz_client_loop' was killed by signal 2
Client 'snz_client_loop' failed
1 child(ren) failed

In other words, for some reason, the abort signal sent to the client was
not delivered until the server (parent process of the client) got
interrupted itself.

Has anyone else observed this inability of the test server to deliver
the abort signal to its client until it is itself interrupted?  Is
there a bug in the test-compositor.c code (or maybe even
wayland-server.c)?

As a workaround, I had the client exit instead of sleep. But in that
case the test passes even though the server encounters the expected
error.  Is there a way to configure the server such that if it
encounters an error, it terminates the test as a failure?