[Intel-gfx] [PATCH 3/7] drm/i915: Convert fences to use a GGTT lock rather than struct_mutex

Daniel Vetter daniel at ffwll.ch
Thu Jul 12 07:12:43 UTC 2018


On Wed, Jul 11, 2018 at 12:12:39PM +0100, Chris Wilson wrote:
> Quoting Chris Wilson (2018-07-11 11:57:38)
> > Quoting Daniel Vetter (2018-07-11 10:08:46)
> > > I think the above change isn't correct, at least not yet at this stage:
> > > All users of the userfault_list still use dev->struct_mutex, not vm.mutex.
> > > I guess we could move that over to the ggtt.vm.mutex eventually, but this
> > > patch doesn't do that.
> > 
> > It does, all those misplaced hunks are not so misplaced.
> 
> Since we have differing opinions on whether this is or is not
> sufficiently guarding GGTT vs rpm, what test are we missing in CI to
> conclusively indicate whether or not this is broken. As it stands, CI is
> happy, and I don't have many machines where rpm works (since it requires
> fw, sound drivers, etc).

So the "misplaced hunk" comment I typed before I fully read through the
patch and spotted the rework of the userfault list. My understanding is
still that the userfault_list is protected by rpm (see some of the
assert_rpm_wakelock_held right next to them), or maybe a combination of
rpm + dev->struct_mutex.

Afaict it's definitely not protected by the new vm.mutex, even after this
patch. That's why I complained about the comment change, and also why I
didn't see why this patch here needs the userfault_count changes.

Now could very well be that I'm missing something around userfault - I
don't remember the details, except that every time I try to reconstruct a
mental model for this I get wrong for a few days. But if that's the case
then I think you're patch isn't sufficient.

- Just for code organization reasons I think we should then move
  mm.userfault_list to the ggtt, like you've done with the fence stuff.
  Separate patch probably, like the fence prep.

- All the other places that touch userfault_* need to be audited/fixed
  too.

This is probably something we need/want to do, but I don't see the
relationship with fences. And assuming the current locking scheme is sound
(with it's funky combination of dev->struct_mutex + rpm) I don't see why
we need to do anything in a patch that only moves the fence stuff over
(and not anything of the other ggtt mmap trickery) to vm.mutex.

Wrt CI coverage: Given that I think the current code without your changes
would be fine, even with fence reg tracking protected by vm.mutex, there's
nothing for CI to prove. I'm just saying you could drop all the
userfault_* related changes from this patch, and it should still all work.

tldr; I think userfault_* is safe as is and will stay safe even with
fences moved to other locks. You can keep the various hunks I complained
about for future patches which will move userfault_* over to
ggtt.vm.mutex.

And oh dear do I not look forward to revieweing userfault_* locking
changes :-)

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the Intel-gfx mailing list