[PATCH] drm/amdgpu: annotate a false positive recursive locking

Daniel Vetter daniel at ffwll.ch
Fri Aug 7 09:44:34 UTC 2020


On Fri, Aug 7, 2020 at 11:34 AM Christian König
<christian.koenig at amd.com> wrote:
>
> Am 07.08.20 um 11:23 schrieb Daniel Vetter:
> > On Fri, Aug 7, 2020 at 11:20 AM Daniel Vetter <daniel at ffwll.ch> wrote:
> >> On Fri, Aug 7, 2020 at 11:08 AM Christian König
> >> <christian.koenig at amd.com> wrote:
> >>> [SNIP]
> >>>>>>> What we should do instead is to make sure we have only a single lock for the complete hive instead.
> >>>>>>> [Dennis Li] If we use a single lock, users will must wait for all devices resuming successfully, but in fact their tasks are only running in device a. It is not friendly to users.
> >>>>>> Well that is intentional I would say. We can only restart submissions after all devices are resumed successfully, cause otherwise it can happen that a job on device A depends on memory on device B which isn't accessible.
> >>>>>> [Dennis Li] Yes, you are right. Driver have make sure that the shared resources(such as the shard memory) are ready before unlock the lock of adev one by one. But each device still has private hardware resources such as video  and display.
> >>>>> Yeah, but those are negligible and we have a team working on display support for XGMI. So this will sooner or later become a problem as well.
> >>>>>
> >>>>> I still think that a single rwsem for the whole hive is still the best option here.
> >>>>>
> >>>>> [Dennis Li] For your above concern, we can open a new thread to discuss it. If we make a decision to use a single after discussing, we can create another patch for it.
> >>>> That's a really good argument, but I still hesitate to merge this patch.
> >>>> How severe is the lockdep splat?
> >>>>
> >>>> At bare minimum we need a "/* TODO: ...." comment why we do this and how to remove the workaround again.
> >>>> [Dennis Li] It is not a workaround. According to design of lockdep, each rw_semaphore should has a separated lock_class_key. I have explained that init_rwsem has the above limitation, so we must correct it.
> >>> Yeah, but this explanation is not really sufficient. Again this is not
> >>> an limitation of init_rwsem, but intentional behavior.
> >>>
> >>> In other words in most cases lockdep should splat if you use rw
> >>> semaphores like this.
> >>>
> >>> That we avoid this by locking multiple amdgpu device always in the same
> >>> order is rather questionable driver design.
> >> Yeah don't give multiple locking classes like that, that's fairly
> >> terrible.
>
> Ok, that is exactly the answer I feared we would get.
>
> >> The right fix is mutex_lock_nest_lock, which we're using in
> >> ww_mutex or in which is also used in mm_take_all_locks.
>
> Ah! Enlightenment! So in this case we should use down_write_nested() in
> this special space and all is well?

Nope. There's two kinds of nesting that lockdep supports:
1. _nested() variants, where you supply an integer parameter
specifying the lockdep subclass. That's like setting a locking
subclass at lock creating time, except much worse because you change
the lockdep class at runtime. This can lead to stuff like:

struct mutex A;
mutex_init(&A);
mutex_lock(&A);
mutex_lock_nested(&A, SINGLE_DEPTH_NESTING);

which is a very clear deadlock, but lockdep will think it's all fine.
Don't use that.

2. _nest_lock() variants, where you specify a super-lock, which makes
sure that all sub-locks cannot be acquired concurrently (or you
promise there's some other trick to avoid deadlocks). Lockdep checks
that you've acquired the super-lock, and then allows arbitary nesting.
This is what ww_mutex uses, and what mm_take_all_locks uses. If the
super-lock is an actual mutex, then this is actually deadlock free.
With ww_mutex it's a fake lockdep_map in the ww_acquire_ctx, but we
guarantee deadlock-freeness through the ww algorithm. The fake super
lock for ww_mutex is acquired when you get the ticket.

No 2 is the thing you want, not no 1.

> >>
> >> If you solve the locking ordering by sorting all the locks you need
> >> (ugh), then you can also just use a fake lockdep_map for your special
> >> function only.
> >>
> >> Also in general the idea of an rwsem in conjunction with xgmi cross
> >> device memory handling just plain freaks me out :-) Please make sure
> >> you prime this lock against dma_resv and dma_fence (and maybe also
> >> drm_modeset_lock if this lock is outside of dma_resv), and ideally
> >> this should only ever be used for setup stuff when loading drivers. I
> >> guess that's why you went with a per-device rwsem. If it's really only
> >> for setup then just do the simple thing of having an
> >> xgmi_hive_reconfigure lock, which all the rwsems nest within, and no
> >> fancy lock ordering or other tricks.
>
> Well this is for the group reset of all devices in a hive and you need
> to reboot to change the order the device are in the list.

Ok, that sounds save enough. I'd go with a xgmi_hive_lock then, and
down_write_nest_lock(&adev->xgmi_rwsem, &xgmi_hive_lock); Taking the
global lock for device reset shouldn't be a problem, we're not going
to reset multiple devices in parallel anyway. Or at least I don't hope
that use-case is a perf critical benchmark for you guys :-)

Cheers, Daniel

> Regards,
> Christian.
>
> >>
> >>>>    Core network driver (net/core/dev.c) has the similar use case with us.
> >>> Just took a look at that, but this is completely different use case. The
> >>> lockdep classes are grouped by driver type to note the difference in the
> >>> network stack.
> >>>
> >>> A quick search showed that no other device driver is doing this in the
> >>> kernel, so I'm not sure if such a behavior wouldn't be rejected. Adding
> >>> Daniel to comment on this as well.
> >>>
> >>> Felix had some really good arguments to make that an urgent fix, so
> >>> either we come up with some real rework of this or at least add a big
> >>> "/* TODO....*/".
> >> Aside from "I have questions" I don't think there's any reason to no
> >> fix this correctly with mutex_lock_nest_lock. Really shouldn't be a
> >> semantic change from what I'm understand here.
> > Also one more with my drm maintainer hat on, as a heads-up: Dave&me
> > looked at i915 gem code a bit too much, and we're appalled at the
> > massive over-use of lockdep class hacks and dubious nesting
> > annotations. Expect crack down on anyone who tries to play clever
> > tricks here, we have a few too many already :-)
> >
> > So no mutex_lock_nested (totally different beast from
> > mutex_lock_nest_lock, and really unsafe since it's a runtime change of
> > the lockdep class) and also not any lockdep_set_class trickery or
> > anything like that. We need locking hierarchies which mere humans can
> > comprehend and review.
> >
> > Cheers, Daniel
> >
> >> Cheers, Daniel
> >>
> >>> Regards,
> >>> Christian.
> >>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>
> >> --
> >> Daniel Vetter
> >> Software Engineer, Intel Corporation
> >> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&data=02%7C01%7Cchristian.koenig%40amd.com%7Cfa969d301e9f4a98c05608d83ab3a0f8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637323890463734977&sdata=dCB0oOQ9hniHqB%2FEUZ15r6lZNOSRYbRp9pXNhL7SVDM%3D&reserved=0
> >
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the amd-gfx mailing list