[RFC] dma-buf/fence-array: signal from wq

Wed Oct 26 15:54:23 UTC 2016

On Sun, Oct 16, 2016 at 2:54 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> On Sun, Oct 16, 2016 at 02:29:51PM -0400, Rob Clark wrote:
>> On Sun, Oct 16, 2016 at 1:49 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
>> > On Sun, Oct 16, 2016 at 12:39:35PM -0400, Rob Clark wrote:
>> >> This is the alternative approach for solving a deadlock situation with
>> >> array-fences.
>> >>
>> >> Currently with fence-array, we have a potential deadlock situation.  If we
>> >> fence_add_callback() on an array-fence, the array-fence's lock is aquired
>> >> first, and in it's ->enable_signaling() callback, it will install cb's on
>> >> it's array-member fences, so the array-member's lock is acquired second.
>> >>
>> >> But in the signal path, the array-member's lock is acquired first, and the
>> >> array-fence's lock acquired second.
>> >
>> > I'm feeling this is the better approach, it puts the complexity inside
>> > the trickster. However, I think it is better to move the wq to
>> > enable-signaling rather than the fence_signal() cb. (Latency at the
>> > start of the wait will hopefully be absorbed by the wait, but we cannot
>> > hide latency at the end).
>>
>> my spidey sense is telling me that would race with the fence getting signalled..
>
> Which we don't care about, since we haven't yet added ourselves to the
> cb_list of the fence so haven't yet adjusted the fence-array's pending
> count and so haven't yet signaled the fence array. i.e. we can tolerate
> enabling signaling on the fence array after all of its children are
> signaled. Imo, better to have that latency upfront.

btw, another fun one, and I think another argument for doing the wq on
the callback side so we can avoid the final fence_put() from the
callback:

---------
=============================================
[ INFO: possible recursive locking detected ]
4.7.0-rc7+ #548 Not tainted
---------------------------------------------
surfaceflinger/2038 is trying to acquire lock:
 (&(&obj->child_list_lock)->rlock){......}, at: [<ffff0000085908a8>]
timeline_fence_release+0x38/0xa8

but task is already holding lock:[  168.064283]
(&(&obj->child_list_lock)->rlock){......}, at: [<ffff000008590b40>]
sw_sync_ioctl+0x228/0x3b0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&obj->child_list_lock)->rlock);
  lock(&(&obj->child_list_lock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

1 lock held by surfaceflinger/2038:
 #0:  (&(&obj->child_list_lock)->rlock){......}, at:
[<ffff000008590b40>] sw_sync_ioctl+0x228/0x3b0

stack backtrace:
CPU: 3 PID: 2038 Comm: surfaceflinger Not tainted 4.7.0-rc7+ #548
Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
Call trace:
[<ffff000008088be8>] dump_backtrace+0x0/0x1a8
[<ffff000008088da4>] show_stack+0x14/0x20
[<ffff0000083b8234>] dump_stack+0xb4/0xf0
[<ffff0000081080f4>] __lock_acquire+0xf0c/0x18d8
[<ffff000008108e0c>] lock_acquire+0x4c/0x68
[<ffff000008ac72b4>] _raw_spin_lock_irqsave+0x54/0x70
[<ffff0000085908a8>] timeline_fence_release+0x38/0xa8
[<ffff00000858d758>] fence_release+0x28/0x50
[<ffff00000858f878>] fence_array_release+0x88/0xd0
[<ffff00000858d758>] fence_release+0x28/0x50
[<ffff00000858fa94>] fence_array_cb_func+0x64/0x88
[<ffff00000858d31c>] fence_signal_locked+0x8c/0x100
[<ffff000008590c10>] sw_sync_ioctl+0x2f8/0x3b0
[<ffff0000081faf6c>] do_vfs_ioctl+0xa4/0x790
[<ffff0000081fb6e4>] SyS_ioctl+0x8c/0xa0
[<ffff000008084f30>] el0_svc_naked+0x24/0x28
---------