[igt-dev] [PATCH i-g-t 3/9] lib: Don't leak children in igt_waitchildren_timeout

Daniel Vetter daniel at ffwll.ch
Tue Feb 12 08:45:32 UTC 2019


On Mon, Feb 11, 2019 at 11:01:16PM +0000, Chris Wilson wrote:
> Quoting Daniel Vetter (2019-02-11 22:38:25)
> > On Mon, Feb 11, 2019 at 09:03:12PM +0000, Chris Wilson wrote:
> > > Quoting Daniel Vetter (2019-02-11 18:02:02)
> > > > Instead of cleaning up the mess in igt_exit make sure we don't even
> > > > let it out of the container. See also
> > > > 
> > > > commit 754876378d6c9b2775e8c07b4d16f9878c55949f
> > > > Author: Chris Wilson <chris at chris-wilson.co.uk>
> > > > Date:   Fri Feb 26 22:11:10 2016 +0000
> > > > 
> > > >     igt/gem_sync: Enforce a timeout of 20s
> > > > 
> > > > which added this helper.
> > > > 
> > > > To make sure that everyone follows the rules, add an assert.
> > > > 
> > > > We're keeping the cleanup code as a failsafe, and because it speeds
> > > > up the testcase I'm following up with.
> > > > 
> > > > v2: Chris pointed out that my original patch did nothing. Which I
> > > > didn't catch because my testcase was also broken. Unfortunately this
> > > > means we need to open code part of the waiting.
> > > > 
> > > > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > > > Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
> > > > ---
> > > >  lib/igt_core.c | 39 ++++++++++++++++++++++++++++++++++-----
> > > >  1 file changed, 34 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/lib/igt_core.c b/lib/igt_core.c
> > > > index cbbe79f88070..3053697da58c 100644
> > > > --- a/lib/igt_core.c
> > > > +++ b/lib/igt_core.c
> > > > @@ -1525,6 +1525,7 @@ void igt_exit(void)
> > > >  
> > > >         for (int c = 0; c < num_test_children; c++)
> > > >                 kill(test_children[c], SIGKILL);
> > > > +       assert(!num_test_children);
> > > >  
> > > >         if (!test_with_subtests) {
> > > >                 struct timespec now;
> > > > @@ -1832,20 +1833,48 @@ void igt_waitchildren(void)
> > > >                 igt_fail(err);
> > > >  }
> > > >  
> > > > +static bool igt_killchidren_timed_out;
> > > > +
> > > > +static void igt_alarm_killchildren(int signal)
> > > > +{
> > > > +       igt_info("Timed out waiting for children\n");
> > > 
> > > We used to print the caller supplied reason. Although that appears to
> > > have never been used, so might as well drop it from the API.
> > > 
> > > > +
> > > > +       igt_killchidren_timed_out = true;
> > > > +
> > > > +       for (int c = 0; c < num_test_children; c++)
> > > > +               kill(test_children[c], SIGKILL);
> > > 
> > > Ok, kill() is signal-safe.
> > > 
> > > > +}
> > > > +
> > > >  /**
> > > >   * igt_waitchildren_timeout:
> > > >   * @seconds: timeout in seconds to wait
> > > >   * @reason: debug string explaining what timedout
> > > >   *
> > > > - * Wait for all children forked with igt_fork, for a maximum of @seconds.
> > > > - *
> > > > - * Wraps igt_waitchildren() and igt_set_timeout()
> > > > + * Wait for all children forked with igt_fork, for a maximum of @seconds. If the
> > > > + * timeout expires, kills all children and cleans them up.
> > > >   */
> > > >  void igt_waitchildren_timeout(int seconds, const char *reason)
> > > >  {
> > > > -       igt_set_timeout(seconds, reason);
> > > > -       igt_waitchildren();
> > > > +       struct sigaction sa;
> > > > +       int ret;
> > > > +
> > > > +       sa.sa_handler = igt_alarm_killchildren;
> > > > +       sigemptyset(&sa.sa_mask);
> > > > +       sa.sa_flags = 0;
> > > > +
> > > > +       igt_killchidren_timed_out = false;
> > > > +
> > > > +       sigaction(SIGALRM, &sa, NULL);
> > > > +
> > > > +       alarm(seconds);
> > > > +
> > > > +       ret = __igt_waitchildren();
> > > > +       if (!igt_killchidren_timed_out && ret)
> > > > +               igt_fail(ret);
> > > >         igt_reset_timeout();
> > > > +       __igt_waitchildren();
> > > > +       if (igt_killchidren_timed_out)
> > > > +               igt_fail(IGT_EXIT_FAILURE);
> > > 
> > > Double __igt_waitchildren()? My reading of __igt_waitchildren() is that
> > > it will continue on after the alarm() wakes wait() up with SIGINT and
> > > repeat the wait() until all children die. And now those children will
> > > die, rather than the parent as before.
> > 
> > igt_waitchildren_timeout before this patch wouldn't work if it did that.
> 
> Before this patch, the alarm handler did igt_fail -> exit(1) in the
> parent.
> 
> > The pid == wait(); if (pid == -1) continue; bails out to the next child in
> > case of interrupts. Hence the wait; kill; wait sequence here. I discovered
> > this through testcases :-)
> 
> Now alarm -> wait() returning -1 + errno=EINTR, right? Then the
> sighandler does killall -9, so on the next wait it sees that all the
> children are dead.
> 
> __igt_waitchildren() even sets num_test_children = 0 on return, so the
> second __igt_waitchildren() must do nothing. Or I am very confused.

Hm right I think I got myself totally confused with some older version of
this (which was broken), in combination with also broken testcases.

Looking at the code again it does indeed loop until it has them all, plus
there is even a loop to kill all the children if one failed. I'll see what
happens when I drop the 2nd waitchildren, and double-check my tests do
tests what I want to test.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the igt-dev mailing list