deadlock possiblity introduced by "drm/nouveau: use drm_mm in preference to custom code doing the same thing"

Mon Jul 26 10:15:47 PDT 2010

Marcin Slusarz <marcin.slusarz at gmail.com> writes:

> On Sun, Jul 11, 2010 at 11:02:12AM +1000, Ben Skeggs wrote:
>> On Sun, 2010-07-11 at 01:24 +0200, Marcin Slusarz wrote:
>> > Hi
>> > 
>> > Patch "drm/nouveau: use drm_mm in preference to custom code doing the same thing"
>> > in nouveau tree introduced new deadlock possibility, for which lockdep complains loudly:
>> > 
>> > (...)
>> > 
>> Hey,
>> 
>> Thanks for the report, I'll look at this more during the week.
>> 
>> > Deadlock scenario looks like this:
>> > CPU1                                        CPU2
>> > nouveau code calls some drm_mm.c
>> > function which takes mm->unused_lock
>> > 
>> >                                             nouveau_channel_free disables irqs and takes dev_priv->context_switch_lock
>> >                                                             calls nv50_graph_destroy_context which
>> >                                                             (... backtrace above)
>> >                                                             calls drm_mm_put_block which tries to take mm->unused_lock (spins)
>> > nouveau interrupt raises
>> > 
>> > nouveau_irq_handler tries to take
>> > dev_priv->context_switch_lock (spins)
>> > 
>> > deadlock
>> It's important to note that the drm_mm referenced eventually by
>> nv50_graph_destroy_context is per-channel on the card, so for the
>> deadlock to happen it'd have to be multiple threads from a single
>> process, one thread creating/destroying and object on the channel while
>> another was trying to destroy the channel.
>> 

Yeah, and that situation is impossible ATM because those functions are
called with the BKL held.

>> > 
>> > Possible solutions:
>> > - reverting "drm/nouveau: use drm_mm in preference to custom code doing the same thing"
>> > - disabling interrupts before calling drm_mm functions - unmaintainable and still
>> >   deadlockable in multicard setups (nouveau and eg radeon)
>> Agreed it's unmaintainable, but, as mentioned above, the relevant locks
>> can't be touched by radeon.
>> 
>> > - making mm->unused_lock HARDIRQ-safe (patch below) - simple but with slight overhead
>> I'll look more during the week, there's other solutions to be explored.
>
> So, did you find other solution?

Some random ideas:

 - Make context_switch_lock HARDIRQ-unsafe. To avoid racing with the IRQ
   handler we'd have to disable interrupt dispatch on the card before
   taking context_switch_lock (i.e. at channel creation and destruction
   time), and the interrupt control registers would have to be protected
   with a IRQ safe spinlock.

 - Split the current destroy_context() hooks in two halves, the first
   one would be in charge of the PFIFO/PGRAPH-poking tasks (e.g.
   disable_context()), and the second one would take care of releasing
   the allocated resources (and it wouldn't need locking).

>
> Marcin
>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 229 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20100726/4e4d6385/attachment.pgp>