[Intel-gfx] [PATCH] igt_core: zero exit_handler_count before forking
Daniel Vetter
daniel at ffwll.ch
Thu Sep 4 10:39:31 CEST 2014
On Thu, Sep 04, 2014 at 10:34:13AM +0200, Daniel Vetter wrote:
> On Wed, Sep 03, 2014 at 02:47:21PM -0300, Paulo Zanoni wrote:
> > From: Paulo Zanoni <paulo.r.zanoni at intel.com>
> >
> > If we don't reset exit_handler_count before forking, we may have a
> > case where the forked process is killed before it even does
> > "exit_handler_count = 0": in that case, it is still finishing forking.
> > When that happens, we may end up calling our exit handlers. On the
> > specific bug I'm investigating, we call igt_reset_connnectors(), which
> > ends up in a deadlock inside malloc_atfork. If we attach gdb to the
> > forked process and get a backtrace, we have:
> >
> > (gdb) bt
> > 0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
> > 1 0x00007f15634d36bf in _L_lock_10524 () from /lib/x86_64-linux-gnu/libc.so.6
> > 2 0x00007f15634d12ef in malloc_atfork (sz=139729840351352, caller=<optimized out>) at arena.c:181
> > 3 0x00007f15640466a1 in drmMalloc () from /usr/lib/x86_64-linux-gnu/libdrm.so.2
> > 4 0x00007f1564049ad7 in drmModeGetResources () from /usr/lib/x86_64-linux-gnu/libdrm.so.2
> > 5 0x0000000000408f84 in igt_reset_connectors () at igt_kms.c:1656
> > 6 0x00000000004092dc in call_exit_handlers (sig=15) at igt_core.c:1130
> > 7 fatal_sig_handler (sig=15) at igt_core.c:1154
> > 8 <signal handler called>
> > 9 0x00007f15634cce60 in ptmalloc_unlock_all2 () at arena.c:298
> > 10 0x00007f156350ca3f in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c:188
> > 11 0x000000000040a029 in __igt_fork_helper (proc=proc at entry=0x610fc4 <signal_helper>) at igt_core.c:910
> > 12 0x000000000040459d in igt_fork_signal_helper () at igt_aux.c:110
> > 13 0x0000000000402ab7 in __real_main63 () at bug.c:76
> > 14 0x000000000040296e in main (argc=<optimized out>, argv=<optimized out>) at bug.c:63
> >
> > After doing some searches for "stuck at malloc_atfork", it seems to me
> > we probably shouldn't be doing any malloc calls at this point of the
> > code, so the best way to do that is to make sure we can't really run
> > the exit handlers.
> >
> > So on this patch, instead of resetting the exit handlers after
> > forking, we reset them before forking, and then restore the original
> > value on the parent process.
> >
> > I can reproduce this problem by running "./kms_flip --run-subtest
> > 2x-flip-vs-modeset" under an infinite loop. Usually after a few
> > hundred calls, we end up stuck on the deadlock mentioned above. QA
> > says this problem happens every time, but I'm not sure what is the
> > difference between our environments that makes the race condition so
> > much easier for them.
> >
> > The kms_flip.c problem can be considered a regression introduced by:
> > commit eef768f283466b6d7cb3f08381f72ccf3951dc99
> > Author: Thomas Wood <thomas.wood at intel.com>
> > Date: Wed Jun 18 14:28:43 2014 +0100
> > tests: enable extra connectors in kms_flip and kms_pipe_crc_basic
> >
> > even though this commit is not the one that introduced the real
> > problem.
> >
> > It is also possible to reproduce this problem with a few modifications
> > to template.c:
> > - Add a call to igt_enable_connectors() inside the first fixture.
> > - Add igt_fork_signal_helper() and igt_stop_signal_helper() calls
> > around subtest B.
> >
> > Cc: Thomas Wood <thomas.wood at intel.com>
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81367
> > Signed-off-by: Paulo Zanoni <paulo.r.zanoni at intel.com>
>
> Awesome piece of debugging!
>
> But imo the commit message is missing the explanation why the forked child
> exists so quickly - the reason is that for skipped tests the parent kills
> it right away, before libc has completed the setup code in the child.
> We've had fun with this earlier, see
>
> commit 139c72f38a07c545f5a9ab5fa3750779987b9275
> Author: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Date: Tue Dec 3 16:44:55 2013 +0000
>
> drmtest: Avoid wrong PID/TID after clone races
>
> and the earlier attempt at
>
> commit 9298dfabd9658315df34616b1e9a10b3579a92bd
> Author: Daniel Vetter <daniel.vetter at ffwll.ch>
> Date: Fri Sep 13 16:38:59 2013 +0200
>
> lib: add test for igt_fork_signal_helper
I've meant
commit a031a1bf93b828585e7147f06145fc5030814547
Author: Daniel Vetter <daniel.vetter at ffwll.ch>
Date: Fri Sep 13 16:43:22 2013 +0200
lib/drmtest: ducttape over fork race
Aside: For bugs in the test library itself I've started to create
test-testcases in tests/igt_*.c. You can run them with a simple
$ make check
If you have an idea for a testcase then please add it, I myself didn't
come up with anything. Note that the testcase can't rely at all on an i915
device being enabled, since this is meant to be run while building as a
normal user on any kind of machine. So it's really just tests for the core
library code.
Thanks, Daniel
>
> I've added a note and commment and merged the patch, thanks.
> -Daniel
>
>
> > ---
> > lib/igt_core.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/igt_core.c b/lib/igt_core.c
> > index 7f14b17..e9a27e3 100644
> > --- a/lib/igt_core.c
> > +++ b/lib/igt_core.c
> > @@ -895,6 +895,7 @@ bool __igt_fork_helper(struct igt_helper_process *proc)
> > {
> > pid_t pid;
> > int id;
> > + int tmp_count;
> >
> > assert(!proc->running);
> > assert(helper_process_count < ARRAY_SIZE(helper_process_pids));
> > @@ -904,16 +905,19 @@ bool __igt_fork_helper(struct igt_helper_process *proc)
> >
> > igt_install_exit_handler(fork_helper_exit_handler);
> >
> > + tmp_count = exit_handler_count;
> > + exit_handler_count = 0;
> > switch (pid = fork()) {
> > case -1:
> > + exit_handler_count = tmp_count;
> > igt_assert(0);
> > case 0:
> > - exit_handler_count = 0;
> > reset_helper_process_list();
> > oom_adjust_for_doom();
> >
> > return true;
> > default:
> > + exit_handler_count = tmp_count;
> > proc->running = true;
> > proc->pid = pid;
> > proc->id = id;
> > --
> > 2.1.0
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the Intel-gfx
mailing list