[igt-dev] [PATCH i-g-t 3/9] lib: Don't leak children in igt_waitchildren_timeout

Chris Wilson chris at chris-wilson.co.uk
Mon Feb 11 23:01:16 UTC 2019


Quoting Daniel Vetter (2019-02-11 22:38:25)
> On Mon, Feb 11, 2019 at 09:03:12PM +0000, Chris Wilson wrote:
> > Quoting Daniel Vetter (2019-02-11 18:02:02)
> > > Instead of cleaning up the mess in igt_exit make sure we don't even
> > > let it out of the container. See also
> > > 
> > > commit 754876378d6c9b2775e8c07b4d16f9878c55949f
> > > Author: Chris Wilson <chris at chris-wilson.co.uk>
> > > Date:   Fri Feb 26 22:11:10 2016 +0000
> > > 
> > >     igt/gem_sync: Enforce a timeout of 20s
> > > 
> > > which added this helper.
> > > 
> > > To make sure that everyone follows the rules, add an assert.
> > > 
> > > We're keeping the cleanup code as a failsafe, and because it speeds
> > > up the testcase I'm following up with.
> > > 
> > > v2: Chris pointed out that my original patch did nothing. Which I
> > > didn't catch because my testcase was also broken. Unfortunately this
> > > means we need to open code part of the waiting.
> > > 
> > > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > > Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
> > > ---
> > >  lib/igt_core.c | 39 ++++++++++++++++++++++++++++++++++-----
> > >  1 file changed, 34 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/lib/igt_core.c b/lib/igt_core.c
> > > index cbbe79f88070..3053697da58c 100644
> > > --- a/lib/igt_core.c
> > > +++ b/lib/igt_core.c
> > > @@ -1525,6 +1525,7 @@ void igt_exit(void)
> > >  
> > >         for (int c = 0; c < num_test_children; c++)
> > >                 kill(test_children[c], SIGKILL);
> > > +       assert(!num_test_children);
> > >  
> > >         if (!test_with_subtests) {
> > >                 struct timespec now;
> > > @@ -1832,20 +1833,48 @@ void igt_waitchildren(void)
> > >                 igt_fail(err);
> > >  }
> > >  
> > > +static bool igt_killchidren_timed_out;
> > > +
> > > +static void igt_alarm_killchildren(int signal)
> > > +{
> > > +       igt_info("Timed out waiting for children\n");
> > 
> > We used to print the caller supplied reason. Although that appears to
> > have never been used, so might as well drop it from the API.
> > 
> > > +
> > > +       igt_killchidren_timed_out = true;
> > > +
> > > +       for (int c = 0; c < num_test_children; c++)
> > > +               kill(test_children[c], SIGKILL);
> > 
> > Ok, kill() is signal-safe.
> > 
> > > +}
> > > +
> > >  /**
> > >   * igt_waitchildren_timeout:
> > >   * @seconds: timeout in seconds to wait
> > >   * @reason: debug string explaining what timedout
> > >   *
> > > - * Wait for all children forked with igt_fork, for a maximum of @seconds.
> > > - *
> > > - * Wraps igt_waitchildren() and igt_set_timeout()
> > > + * Wait for all children forked with igt_fork, for a maximum of @seconds. If the
> > > + * timeout expires, kills all children and cleans them up.
> > >   */
> > >  void igt_waitchildren_timeout(int seconds, const char *reason)
> > >  {
> > > -       igt_set_timeout(seconds, reason);
> > > -       igt_waitchildren();
> > > +       struct sigaction sa;
> > > +       int ret;
> > > +
> > > +       sa.sa_handler = igt_alarm_killchildren;
> > > +       sigemptyset(&sa.sa_mask);
> > > +       sa.sa_flags = 0;
> > > +
> > > +       igt_killchidren_timed_out = false;
> > > +
> > > +       sigaction(SIGALRM, &sa, NULL);
> > > +
> > > +       alarm(seconds);
> > > +
> > > +       ret = __igt_waitchildren();
> > > +       if (!igt_killchidren_timed_out && ret)
> > > +               igt_fail(ret);
> > >         igt_reset_timeout();
> > > +       __igt_waitchildren();
> > > +       if (igt_killchidren_timed_out)
> > > +               igt_fail(IGT_EXIT_FAILURE);
> > 
> > Double __igt_waitchildren()? My reading of __igt_waitchildren() is that
> > it will continue on after the alarm() wakes wait() up with SIGINT and
> > repeat the wait() until all children die. And now those children will
> > die, rather than the parent as before.
> 
> igt_waitchildren_timeout before this patch wouldn't work if it did that.

Before this patch, the alarm handler did igt_fail -> exit(1) in the
parent.

> The pid == wait(); if (pid == -1) continue; bails out to the next child in
> case of interrupts. Hence the wait; kill; wait sequence here. I discovered
> this through testcases :-)

Now alarm -> wait() returning -1 + errno=EINTR, right? Then the
sighandler does killall -9, so on the next wait it sees that all the
children are dead.

__igt_waitchildren() even sets num_test_children = 0 on return, so the
second __igt_waitchildren() must do nothing. Or I am very confused.
-Chris


More information about the igt-dev mailing list