[Intel-gfx] [PATCH 6/9] drm/i915: driver based PASID handling

Wed Oct 7 09:28:17 PDT 2015

On 10/07/2015 09:14 AM, Daniel Vetter wrote:
> On Wed, Oct 07, 2015 at 08:16:42AM -0700, Jesse Barnes wrote:
>> On 10/07/2015 06:00 AM, David Woodhouse wrote:
>>> On Fri, 2015-09-04 at 09:59 -0700, Jesse Barnes wrote:
>>>> +
>>>> +       ret = handle_mm_fault(mm, vma, address,
>>>> +                             desc.wr_req ? FAULT_FLAG_WRITE : 0);
>>>> +       if (ret & VM_FAULT_ERROR) {
>>>> +               gpu_mm_segv(tsk, address, SEGV_ACCERR); /* ? */
>>>> +               goto out_unlock;
>>>> +       }
>>>> +
>>>
>>> Hm, do you need to force the SEGV there, in what ought to be generic
>>> IOMMU code?
>>>
>>> Can you instead just let the fault handler return an appropriate
>>> failure code to the IOMMU request queue and then deal with the
>>> resulting error on the i915 device side?
>>
>> I'm not sure if we get enough info on the i915 side to handle it
>> reasonably, we'll have to test that out.
> 
> We do know precisely which context blew up, but without the TDR work we
> can't yet just kill the offender selective without affecting the other
> active gpu contexts.

How?  The notification from the IOMMU queue is asynchronous...

> But besides that I really don't see a reason why we need to kill the
> process if the gpu faults. After all if a thread sigfaults then signal
> goes to that thread and not some random one (or the one thread that forked
> the thread that blew up). And we do have interfaces to tell userspace that
> something bad happened with the gpu work it submitted.

We will send a signal, just as in the thread case.  That generally kills
the process, but the process is free to install a handler and try to do
something of course.  The trouble is that a fault like this indicates a
bug, just as it would in the multithreaded case (processors manipulating
the address space without locking for example, or a use after free, or a
simple bad pointer reference).

> Chris made a similar patch for userptr and I didn't like that one either.
> Worst case userspace has a special SEGV handler and then things really go
> down badly when that handler gets triggered at an unexpected place.

Not sure what you're suggesting as an alternative; just let things keep
running somehow?

Jesse