[Intel-gfx] [PATCH igt] core/sighelper: Interrupt everyone in the process group

Mon Jan 11 05:29:12 PST 2016

On 11/01/16 12:34, Chris Wilson wrote:
> On Mon, Jan 11, 2016 at 12:25:07PM +0000, Dave Gordon wrote:
>> On 11/01/16 09:06, Daniel Vetter wrote:
>>> On Mon, Jan 11, 2016 at 08:54:59AM +0000, Chris Wilson wrote:
>>>> On Mon, Jan 11, 2016 at 08:57:33AM +0100, Daniel Vetter wrote:
>>>>> On Fri, Jan 08, 2016 at 08:44:29AM +0000, Chris Wilson wrote:
>>>>>> Some stress tests create both the signal helper and a lot of competing
>>>>>> processes. In these tests, the parent is just waiting upon the children,
>>>>>> and the intention is not to keep waking up the waiting parent, but to
>>>>>> keep interrupting the children (as we hope to trigger races in our
>>>>>> kernel code). kill(-pid) sends the signal to all members of the process
>>>>>> group, not just the target pid.
>>>>>
>>>>> I don't really have any clue about unix pgroups, but the -pid disappeared
>>>>> compared to the previous version.
>>>>
>>>> -getppid().
>>>>
>>>> I felt it was clearer to pass along the "negative pid = process group"
>>>> after setting up the process group.
>>>
>>> Oh, I was blind ... Yeah looks better, but please add a bigger comment
>>> around that code explaining why we need a group and why we use SIG_CONT.
>>> With that acked-by: me.
>>>
>>> Cheers, Daniel
>>>
>>>>>> We also switch from using SIGUSR1 to SIGCONT to paper over a race
>>>>>> condition when forking children that saw the default signal action being
>>>>>> run (and thus killing the child).
>>>>>
>>>>> I thought I fixed that race by first installing the new signal handler,
>>>>> then forking. Ok, rechecked and it's the SYS_getpid stuff, so another
>>>>> race. Still I thought signal handlers would survive a fork?
>>>>
>>>> So did irc. They didn't appear to as the children would sporadically
>>>> die with SIGUSR1.
>>>
>>> Could be that libc is doing something funny, iirc they have piles of fork
>>> helpers to make fork more reliable (breaking locks and stuff like that),
>>> but then in turn break the abstraction.
>>> -Daniel
>>
>> You could use killpg(pgrp, sig) rather than kill(), just to make it
>> clearer that the target is a process group, rather than people
>> having to know about the "negative pid" semantics.
>>
>> I don't think SIGCHLD is a good idea; it has kernel-defined
>> semantics beyond just sending a signal. And it may not be delivered
>> at all, if the disposition is not "caught". SIGUSR1 was the right
>> thing, really; so it would be better to work out how to make that
>> work properly, rather than change to a different one.
>
> SIGCONT not SIGCHLD. And the deposition is supposed to be fully under
> our control any way.

Oops, yes, I meant SIGCONT has kernel-defined semantics, etc ...

Catching SIGCONT is ... unusual. Because you can't catch SIGSTOP, you 
don't normally have any reason to catch SIGCONT.

Actually, SIGCONT is even more bizarre than SIGCHLD as sending a SIGCONT 
to a process can result in a SIGCHLD being sent to its parent.

>> Signal handlers are (supposed to be) inherited across fork(); signal
>> disposition is also inherited, and the set of pending signals of a
>> new process is (supposed to be) empty. OTOH a signal can be
>> delivered to the child before it returns from the fork(), which may
>> be a bit surprising.
>>
>> I think the safest way to avoid unexpected signals around a fork() is:
>>
>> parent calls sigprocmask() to block all interesting signals
>> parent calls fork() --> child inherits mask
>> parent calls sigprocmask() to restore the previous mask
>
> I tried that.
> -Chris

Are we using signal(2) to install the handlers? 'Cos that's archaic and 
has known unfixable race conditions. The Linux kernel supplies SysV 
signal semantics by default, which means the disposition gets reset 
before the handler is called, so a double signal kills the program. The 
glibc signal(3) wrapper provides BSD semantics which are slightly less 
problematic; but libc5 signal(3) implements SysV.

The proper answer is usually to use sigaction(2) instead. Then the race 
conditions don't (shouldn't) occur, at least if the user code gets all 
the options right.

.Dave.