[Intel-gfx] [PATCH 6/9] drm/i915: driver based PASID handling

Fri Oct 9 09:20:23 PDT 2015

On 10/09/2015 02:07 AM, David Woodhouse wrote:
> On Fri, 2015-10-09 at 10:47 +0200, Daniel Vetter wrote:
>> On Fri, Oct 09, 2015 at 08:56:24AM +0100, David Woodhouse wrote:
>>> On Fri, 2015-10-09 at 09:28 +0200, Daniel Vetter wrote:
>>>>
>>>> Hm if this still works the same way as on older platforms then pagefaults
>>>> just read all 0 and writes go nowhere from the gpu. That generally also
>>>> explains ever-increasing numbers of the CS execution pointer since it's
>>>> busy churning through 48b worth of address space filled with MI_NOP. I'd
>>>> have hoped our hw would do better than that with svm ...
>>>
>>> I'm looking at simple cases like Jesse's 'gem_svm_fault' test. If the
>>> access to process address space (a single dword write) does nothing,
>>> I'm not sure why it would then churn through MI_NOOPs; why would the
>>> batch still not complete?
>>
>> Yeah that testcase doesn't fit, the one I had in mind is where the batch
>> itself faults and the CS just reads MI_NOP forever. No idea why the gpu
>> just keeps walking through the address space here. Puzzling.
> 
> Does it just keep walking through the address space?
> 
> When I hacked my page request handler to *not* service the fault and
> just say it failed, the batch did seem to complete as normal. Just
> without doing the write, as you described.

My understanding is that this behavior will depend on how we submit the
work.  We have to faulting modes: halt and stream.  In either case, the
context that faults will be switched out, and the hardware will either
wait for a resubmit (the halt case) to restart the context, or switch to
the next context in the execlist queue.

If the fault is then serviced by the IOMMU layer, potentially as an
error, I'd expect the faulting context to simply fault again.  I don't
think we'd see a GPU hang in the same way we do today, where we get an
indication in the GPU private fault regs and such; they go through the
IOMMU in advanced context mode.

So I think we'll need a callback in the fatal case; we can just kick off
a private i915 worker for that, just like we do for the recoverable case
that's now hidden in the IOMMU layer.

Jesse