[Intel-gfx] [PATCH 6/9] drm/i915: driver based PASID handling
Jesse Barnes
jbarnes at virtuousgeek.org
Wed Oct 7 10:26:20 PDT 2015
On 10/07/2015 10:17 AM, David Woodhouse wrote:
> On Wed, 2015-10-07 at 09:28 -0700, Jesse Barnes wrote:
>> On 10/07/2015 09:14 AM, Daniel Vetter wrote:
>>> On Wed, Oct 07, 2015 at 08:16:42AM -0700, Jesse Barnes wrote:
>>>> On 10/07/2015 06:00 AM, David Woodhouse wrote:
>>>>> On Fri, 2015-09-04 at 09:59 -0700, Jesse Barnes wrote:
>>>>>> +
>>>>>> + ret = handle_mm_fault(mm, vma, address,
>>>>>> + desc.wr_req ? FAULT_FLAG_WRITE : 0);
>>>>>> + if (ret & VM_FAULT_ERROR) {
>>>>>> + gpu_mm_segv(tsk, address, SEGV_ACCERR); /* ? */
>>>>>> + goto out_unlock;
>>>>>> + }
>>>>>> +
>>>>>
>>>>> Hm, do you need to force the SEGV there, in what ought to be generic
>>>>> IOMMU code?
>>>>>
>>>>> Can you instead just let the fault handler return an appropriate
>>>>> failure code to the IOMMU request queue and then deal with the
>>>>> resulting error on the i915 device side?
>>>>
>>>> I'm not sure if we get enough info on the i915 side to handle it
>>>> reasonably, we'll have to test that out.
>>>
>>> We do know precisely which context blew up, but without the TDR work we
>>> can't yet just kill the offender selective without affecting the other
>>> active gpu contexts.
>>
>> How? The notification from the IOMMU queue is asynchronous...
>
> The page request, and the response, include 'private data' which an
> endpoint can use to carry that kind of information.
>
> In $7.5.1.1 of the VT-d specification it tells us:
>
> "Private Data: The Private Data field can be used by
> Root-Complex integrated endpoints to uniquely identify
> device-specific private information associated with an
> individual page request.
>
> "For Intel ® Processor Graphics device, the Private Data field
> specifies the identity of the GPU advanced-context (see
> Section 3.10) sending the page request."
Ah right so we could put our private context ID in there if the PASID doesn't end up being per-context. That would work fine (though as Daniel said we still need fancier reset to handle things more gracefully).
>>> But besides that I really don't see a reason why we need to kill the
>>> process if the gpu faults. After all if a thread sigfaults then signal
>>> goes to that thread and not some random one (or the one thread that forked
>>> the thread that blew up). And we do have interfaces to tell userspace that
>>> something bad happened with the gpu work it submitted.
>
> I certainly don't want the core IOMMU code killing things. I really
> want to just complete the page request with an appropriate failure
> code, and let the endpoint device deal with it from there.
I think that will work, but I want to test and make sure. In the driver mode version I took advantage of the fact that I got an unambiguous page request failure from the IOMMU along with a unique PASID to send the signal. Getting it on the GPU side means looking at some of our existing error state bits, which is something I've been avoiding...
Jesse
More information about the Intel-gfx
mailing list