[Intel-gfx] [PATCH RFC 2/4] drm/i915: IOMMU based SVM implementation v13

Mon Aug 15 12:53:29 UTC 2016

On Mon, Aug 15, 2016 at 01:30:11PM +0100, David Woodhouse wrote:
> On Mon, 2016-08-15 at 13:23 +0100, Chris Wilson wrote:
> > On Mon, Aug 15, 2016 at 01:13:25PM +0100, David Woodhouse wrote:
> > > On Mon, 2016-08-15 at 13:05 +0100, Chris Wilson wrote:
> > > > On Mon, Aug 15, 2016 at 02:48:05PM +0300, Mika Kuoppala wrote:
> > > > > 
> > > > > + struct task_struct *task;
> > > > 
> > > > We don't need the task, we need the mm.
> > > > 
> > > > Holding the task is not sufficient.
> > > 
> > > From the pure DMA point of view, you don't need the MM at all. I handle
> > > all that from the IOMMU side so it's none of your business, darling.
> > 
> > But you don't keep the mm alive for the duration of device activity,
> > right? And you don't wait for the device to finish before releasing the
> > mmu? (iiuc intel-svm.c)
> 
> We don't "keep it alive" (i.e. bump mm->mm_users), no.
> We *did*, but it caused problems. See commit e57e58bd390a68 for the
> gory details.
> 
> Now we only bump mm->mm_count so if the process exits, the MM can still
> be torn down.
> 
> Since exit_mmap() happens before exit_files(), what happens on an
> unclean shutdown is that the GPU may start to take faults on the PASID
> which is in the process of exiting, before the corresponding file
> descriptor gets closed.
> 
> So no, we don't wait for the device to finish before releasing the MM.
> That would involve calling back into device-driver code from the
> mmu_notifier callback, with "interesting" locking constraints. We don't
> trust device drivers that much :)

With the device allocating the memory, we can keep the object alive for
as long as required for it to complete the commands and for other users.

Other uses get access to the svm pages via shared memory (mmap, memfd)
and so another process copying from the buffer should be unaffected by
termination of the original process.

So it is really just what happens to commands for this client when it
dies/exits.  The kneejerk reaction is to say the pages should be kept
alive as they are now for !svm. We could be faced with a situation where
the client copies onto a shared buffer (obtaining a fence), passes that
fence over to the server scheduling an update, and die abruptly. Given
that the fence and request arrive on the server safely (the fence will
be completed even if the command is skipped or its faults filled with
zero), the server will itself proceed to present the incomplete result
from the dead client. (Presently for !svm the output will be intact.)

The question is do we accept the change in behaviour? Or I am
completely misunderstanding how the svm faulting/mmu-notifiers will
work?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre