[Intel-gfx] [PATCH] igt_core: zero exit_handler_count before forking

Daniel Vetter daniel at ffwll.ch
Thu Sep 4 10:39:31 CEST 2014


On Thu, Sep 04, 2014 at 10:34:13AM +0200, Daniel Vetter wrote:
> On Wed, Sep 03, 2014 at 02:47:21PM -0300, Paulo Zanoni wrote:
> > From: Paulo Zanoni <paulo.r.zanoni at intel.com>
> > 
> > If we don't reset exit_handler_count before forking, we may have a
> > case where the forked process is killed before it even does
> > "exit_handler_count = 0": in that case, it is still finishing forking.
> > When that happens, we may end up calling our exit handlers. On the
> > specific bug I'm investigating, we call igt_reset_connnectors(), which
> > ends up in a deadlock inside malloc_atfork. If we attach gdb to the
> > forked process and get a backtrace, we have:
> > 
> > (gdb) bt
> > 0  __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
> > 1  0x00007f15634d36bf in _L_lock_10524 () from /lib/x86_64-linux-gnu/libc.so.6
> > 2  0x00007f15634d12ef in malloc_atfork (sz=139729840351352, caller=<optimized out>) at arena.c:181
> > 3  0x00007f15640466a1 in drmMalloc () from /usr/lib/x86_64-linux-gnu/libdrm.so.2
> > 4  0x00007f1564049ad7 in drmModeGetResources () from /usr/lib/x86_64-linux-gnu/libdrm.so.2
> > 5  0x0000000000408f84 in igt_reset_connectors () at igt_kms.c:1656
> > 6  0x00000000004092dc in call_exit_handlers (sig=15) at igt_core.c:1130
> > 7  fatal_sig_handler (sig=15) at igt_core.c:1154
> > 8  <signal handler called>
> > 9  0x00007f15634cce60 in ptmalloc_unlock_all2 () at arena.c:298
> > 10 0x00007f156350ca3f in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c:188
> > 11 0x000000000040a029 in __igt_fork_helper (proc=proc at entry=0x610fc4 <signal_helper>) at igt_core.c:910
> > 12 0x000000000040459d in igt_fork_signal_helper () at igt_aux.c:110
> > 13 0x0000000000402ab7 in __real_main63 () at bug.c:76
> > 14 0x000000000040296e in main (argc=<optimized out>, argv=<optimized out>) at bug.c:63
> > 
> > After doing some searches for "stuck at malloc_atfork", it seems to me
> > we probably shouldn't be doing any malloc calls at this point of the
> > code, so the best way to do that is to make sure we can't really run
> > the exit handlers.
> > 
> > So on this patch, instead of resetting the exit handlers after
> > forking, we reset them before forking, and then restore the original
> > value on the parent process.
> > 
> > I can reproduce this problem by running "./kms_flip --run-subtest
> > 2x-flip-vs-modeset" under an infinite loop. Usually after a few
> > hundred calls, we end up stuck on the deadlock mentioned above. QA
> > says this problem happens every time, but I'm not sure what is the
> > difference between our environments that makes the race condition so
> > much easier for them.
> > 
> > The kms_flip.c problem can be considered a regression introduced by:
> > commit eef768f283466b6d7cb3f08381f72ccf3951dc99
> > Author: Thomas Wood <thomas.wood at intel.com>
> > Date:   Wed Jun 18 14:28:43 2014 +0100
> >     tests: enable extra connectors in kms_flip and kms_pipe_crc_basic
> > 
> > even though this commit is not the one that introduced the real
> > problem.
> > 
> > It is also possible to reproduce this problem with a few modifications
> > to template.c:
> >   - Add a call to igt_enable_connectors() inside the first fixture.
> >   - Add igt_fork_signal_helper() and igt_stop_signal_helper() calls
> >     around subtest B.
> > 
> > Cc: Thomas Wood <thomas.wood at intel.com>
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81367
> > Signed-off-by: Paulo Zanoni <paulo.r.zanoni at intel.com>
> 
> Awesome piece of debugging!
> 
> But imo the commit message is missing the explanation why the forked child
> exists so quickly - the reason is that for skipped tests the parent kills
> it right away, before libc has completed the setup code in the child.
> We've had fun with this earlier, see
> 
> commit 139c72f38a07c545f5a9ab5fa3750779987b9275
> Author: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Date:   Tue Dec 3 16:44:55 2013 +0000
> 
>     drmtest: Avoid wrong PID/TID after clone races
> 
> and the earlier attempt at
> 
> commit 9298dfabd9658315df34616b1e9a10b3579a92bd
> Author: Daniel Vetter <daniel.vetter at ffwll.ch>
> Date:   Fri Sep 13 16:38:59 2013 +0200
> 
>     lib: add test for igt_fork_signal_helper

I've meant

commit a031a1bf93b828585e7147f06145fc5030814547
Author: Daniel Vetter <daniel.vetter at ffwll.ch>
Date:   Fri Sep 13 16:43:22 2013 +0200

    lib/drmtest: ducttape over fork race

Aside: For bugs in the test library itself I've started to create
test-testcases in tests/igt_*.c. You can run them with a simple

$ make check

If you have an idea for a testcase then please add it, I myself didn't
come up with anything. Note that the testcase can't rely at all on an i915
device being enabled, since this is meant to be run while building as a
normal user on any kind of machine. So it's really just tests for the core
library code.

Thanks, Daniel

> 
> I've added a note and commment and merged the patch, thanks.
> -Daniel
> 
> 
> > ---
> >  lib/igt_core.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/lib/igt_core.c b/lib/igt_core.c
> > index 7f14b17..e9a27e3 100644
> > --- a/lib/igt_core.c
> > +++ b/lib/igt_core.c
> > @@ -895,6 +895,7 @@ bool __igt_fork_helper(struct igt_helper_process *proc)
> >  {
> >  	pid_t pid;
> >  	int id;
> > +	int tmp_count;
> >  
> >  	assert(!proc->running);
> >  	assert(helper_process_count < ARRAY_SIZE(helper_process_pids));
> > @@ -904,16 +905,19 @@ bool __igt_fork_helper(struct igt_helper_process *proc)
> >  
> >  	igt_install_exit_handler(fork_helper_exit_handler);
> >  
> > +	tmp_count = exit_handler_count;
> > +	exit_handler_count = 0;
> >  	switch (pid = fork()) {
> >  	case -1:
> > +		exit_handler_count = tmp_count;
> >  		igt_assert(0);
> >  	case 0:
> > -		exit_handler_count = 0;
> >  		reset_helper_process_list();
> >  		oom_adjust_for_doom();
> >  
> >  		return true;
> >  	default:
> > +		exit_handler_count = tmp_count;
> >  		proc->running = true;
> >  		proc->pid = pid;
> >  		proc->id = id;
> > -- 
> > 2.1.0
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch



More information about the Intel-gfx mailing list