[Intel-gfx] [PATCH] drm/i915: Convert WARNs during userptr revoke to SIGBUS

Mon Sep 28 07:14:57 PDT 2015

On Mon, Sep 28, 2015 at 02:52:30PM +0100, Chris Wilson wrote:
> On Mon, Sep 28, 2015 at 03:42:22PM +0200, Daniel Vetter wrote:
> > On Wed, Sep 23, 2015 at 09:07:24PM +0100, Chris Wilson wrote:
> > > If the client revokes the virtual address it asked to be mapped into GPU
> > > space via userptr (by using anything along the lines of mmap, mprotect,
> > > madvise, munmap, ftruncate etc) the mmu notifier sends a range
> > > invalidate command to userptr. Upon receiving the invalidation signal
> > > for the revoked range, we try to release the struct pages we pinned into
> > > the GTT. However, this can fail if any of the GPU's VMA are pinned for
> > > use by the hardware (i.e. despite the user's intention we cannot
> > > relinquish the client's address range and keep uptodate with whatever is
> > > placed in there). Currently we emit a few WARN so that we would notice
> > > if this every occurred in the wild; it has. Sadly this means we need to
> > > replace those WARNs with the proper SIGBUS to the offending clients
> > > instead.
> > > 
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > > Cc: Michał Winiarski <michal.winiarski at intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem_userptr.c | 41 +++++++++++++++++++++++++++++----
> > >  1 file changed, 37 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > index f75d90118888..efb404b9fe69 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > @@ -81,11 +81,44 @@ static void __cancel_userptr__worker(struct work_struct *work)
> > >  		was_interruptible = dev_priv->mm.interruptible;
> > >  		dev_priv->mm.interruptible = false;
> > >  
> > > -		list_for_each_entry_safe(vma, tmp, &obj->vma_list, obj_link) {
> > > -			int ret = i915_vma_unbind(vma);
> > > -			WARN_ON(ret && ret != -EIO);
> > > +		list_for_each_entry_safe(vma, tmp, &obj->vma_list, obj_link)
> > > +			i915_vma_unbind(vma);
> > > +		if (i915_gem_object_put_pages(obj)) {
> > > +			struct task_struct *p;
> > > +
> > > +			DRM_ERROR("Unable to revoke ownership by userptr of"
> > > +				  " invalidated address range, sending SIGBUS"
> > > +				  " to attached clients.\n");
> > > +
> > > +			rcu_read_lock();
> > > +			for_each_process(p) {
> > > +				siginfo_t info;
> > > +
> > > +				/* This doesn't capture everyone who has
> > > +				 * the pages pinned behind a VMA as we
> > > +				 * do not have that tracking information
> > > +				 * available, it does however kill the
> > > +				 * original process (and siblings) who
> > > +				 * created the userptr and presumably tried
> > > +				 * to reuse the address space despite having
> > > +				 * pinned it (possibly indirectly) to the hw.
> > > +				 * Arguably, we don't even want to kill the
> > > +				 * other processes as they are not at fault,
> > > +				 * likely to be a display server, and hopefully
> > > +				 * will release the pages in due course once
> > > +				 * the client is dead.
> > > +				 */
> > > +				if (p->mm != obj->userptr.mm->mm)
> > > +					continue;
> > > +
> > > +				info.si_signo = SIGBUS;
> > > +				info.si_errno = 0;
> > > +				info.si_code = BUS_ADRERR;
> > > +				info.si_addr = (void __user *)obj->userptr.ptr;
> > > +				force_sig_info(SIGBUS, &info, p);
> > > +			}
> > > +			rcu_read_unlock();
> > 
> > Why do we need to send a SIGBUS? It won't tear down the offending gem bo,
> > any new users will hopefully get it, and abusing SIGBUS without the thread
> > actually doing a memory access is a bit surprising. DRM_DEBUG seems to be
> > the most we can do here I think - I think userspace being able to do this
> > is just a fundamental property of userptr.
> 
> It is not the bo that is at fault but the *client's* *address* *space*
> that is being changed. It is equivalent to mmap on a truncated file i.e.
> if the client tries to use its mmapping after it has truncated the file
> it is scolded via SIGBUS.

But existing SIGBUS is thread-bound and comes with the fault address
attached. This is just the gpu being a bit unhappy, so the SIGBUS comes
out of complete nowhere to smack the userspace thread. Any kind of SIGBUS
catcher userspace has for other reasons might be supremely surprised by
this and do stupid things. Hence I don't think throwing SIGBUS here is
correct behaviour. And there doesn't seem to be anything else suitable
really.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch