[Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915/userptr: Wrap mmu_notifier inside its own rw_semaphore
Daniel Vetter
daniel at ffwll.ch
Tue Mar 27 07:01:17 UTC 2018
On Mon, Mar 26, 2018 at 09:08:33PM +0100, Chris Wilson wrote:
> Quoting Patchwork (2018-03-26 17:53:44)
> > Test gem_userptr_blits:
> > Subgroup coherency-unsync:
> > pass -> INCOMPLETE (shard-hsw)
>
> Forgot that obj->userptr.mn may not exist.
>
> > Subgroup dmabuf-sync:
> > pass -> DMESG-WARN (shard-hsw)
>
> But this is the tricky lockdep one, warning of the recursion from gup
> into mmu_invalidate_range, i.e.
>
> down_read(&i915_mmu_notifier->sem);
> down_read(&mm_struct->mmap_sem);
> gup();
> down_write(&i915_mmut_notifier->sem);
>
> That seems a genuine deadlock... So I wonder how we managed to get a
> lockdep splat and not a dead machine. Maybe gup never triggers the
> recursion for our set of flags? Hmm.
Coffee starting to kick in. If we gup a range it's likely the mm won't
kick out the same range, but something else. I guess we'd need a really
huge userptr bo which can't fit into core completely to actually have a
reliably chance at triggering this. Would probably deadlock the box :-/
I think Jerome's recommendation is the sequence counter stuff from kvm,
plus retrying forever on the gup side. That would convert the same
deadlock into a livelock, but well can't have it all :-) And I think once
you've killed the task the gup worker hopefully realizes it's wasting time
and gives up.
For the kvm stuff: Look at #intel-gfx scrollback, we discussed all the
necessary bits. Plus Jerome showed some new helpers that would avoid the
hand-rolling.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the Intel-gfx
mailing list