[Intel-gfx] [PATCH 6/9] drm/i915: driver based PASID handling

Wed Oct 7 10:17:18 PDT 2015

On Wed, 2015-10-07 at 09:28 -0700, Jesse Barnes wrote:
> On 10/07/2015 09:14 AM, Daniel Vetter wrote:
> > On Wed, Oct 07, 2015 at 08:16:42AM -0700, Jesse Barnes wrote:
> > > On 10/07/2015 06:00 AM, David Woodhouse wrote:
> > > > On Fri, 2015-09-04 at 09:59 -0700, Jesse Barnes wrote:
> > > > > +
> > > > > +       ret = handle_mm_fault(mm, vma, address,
> > > > > +                             desc.wr_req ? FAULT_FLAG_WRITE : 0);
> > > > > +       if (ret & VM_FAULT_ERROR) {
> > > > > +               gpu_mm_segv(tsk, address, SEGV_ACCERR); /* ? */
> > > > > +               goto out_unlock;
> > > > > +       }
> > > > > +
> > > > 
> > > > Hm, do you need to force the SEGV there, in what ought to be generic
> > > > IOMMU code?
> > > > 
> > > > Can you instead just let the fault handler return an appropriate
> > > > failure code to the IOMMU request queue and then deal with the
> > > > resulting error on the i915 device side?
> > > 
> > > I'm not sure if we get enough info on the i915 side to handle it
> > > reasonably, we'll have to test that out.
> > 
> > We do know precisely which context blew up, but without the TDR work we
> > can't yet just kill the offender selective without affecting the other
> > active gpu contexts.
> 
> How?  The notification from the IOMMU queue is asynchronous...

The page request, and the response, include 'private data' which an
endpoint can use to carry that kind of information.

In $7.5.1.1 of the VT-d specification it tells us:

	"Private Data: The Private Data field can be used by 
	 Root-Complex integrated endpoints to uniquely identify
	 device-specific private information associated with an 
	 individual page request.

	"For Intel ® Processor Graphics device, the Private Data field 
	 specifies the identity of the GPU advanced-context (see 
	 Section 3.10) sending the page request."

> > But besides that I really don't see a reason why we need to kill the
> > process if the gpu faults. After all if a thread sigfaults then signal
> > goes to that thread and not some random one (or the one thread that forked
> > the thread that blew up). And we do have interfaces to tell userspace that
> > something bad happened with the gpu work it submitted.

I certainly don't want the core IOMMU code killing things. I really
want to just complete the page request with an appropriate failure
code, and let the endpoint device deal with it from there.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse at intel.com                              Intel Corporation

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5691 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20151007/dd20fa84/attachment.bin>