[Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915/userptr: Wrap mmu_notifier inside its own rw_semaphore

Tue Mar 27 07:21:50 UTC 2018

Quoting Daniel Vetter (2018-03-27 08:01:17)
> On Mon, Mar 26, 2018 at 09:08:33PM +0100, Chris Wilson wrote:
> > Quoting Patchwork (2018-03-26 17:53:44)
> > > Test gem_userptr_blits:
> > >         Subgroup coherency-unsync:
> > >                 pass       -> INCOMPLETE (shard-hsw)
> > 
> > Forgot that obj->userptr.mn may not exist.
> > 
> > >         Subgroup dmabuf-sync:
> > >                 pass       -> DMESG-WARN (shard-hsw)
> > 
> > But this is the tricky lockdep one, warning of the recursion from gup
> > into mmu_invalidate_range, i.e.
> > 
> > down_read(&i915_mmu_notifier->sem);
> > down_read(&mm_struct->mmap_sem);
> >       gup();
> >               down_write(&i915_mmut_notifier->sem);
> > 
> > That seems a genuine deadlock... So I wonder how we managed to get a
> > lockdep splat and not a dead machine. Maybe gup never triggers the
> > recursion for our set of flags? Hmm.
> 
> Coffee starting to kick in. If we gup a range it's likely the mm won't
> kick out the same range, but something else. I guess we'd need a really
> huge userptr bo which can't fit into core completely to actually have a
> reliably chance at triggering this. Would probably deadlock the box :-/
> 
> I think Jerome's recommendation is the sequence counter stuff from kvm,
> plus retrying forever on the gup side. That would convert the same
> deadlock into a livelock, but well can't have it all :-)

Pre-coffee state also thinks it would trigger the second fs_reclaim
lockdep if it was sufficiently annotated.
-Chris