Lockdep spalt on killing a processes

Mon Oct 25 19:10:36 UTC 2021

Adding back Daniel (somehow he got off the addresses list) and Chris who 
worked a lot in this area.

On 2021-10-21 2:34 a.m., Christian König wrote:
>
>
> Am 20.10.21 um 21:32 schrieb Andrey Grodzovsky:
>> On 2021-10-04 4:14 a.m., Christian König wrote:
>>
>>> The problem is a bit different.
>>>
>>> The callback is on the dependent fence, while we need to signal the 
>>> scheduler fence.
>>>
>>> Daniel is right that this needs an irq_work struct to handle this 
>>> properly.
>>>
>>> Christian.
>>
>>
>> So we had some discussions with Christian regarding irq_work and 
>> agreed I should look into doing it but stepping back for a sec -
>>
>> Why we insist on calling the dma_fence_cb  with fence->lock locked ? 
>> Is it because of dma_fence_add_callback ?
>> Because we first test for DMA_FENCE_FLAG_SIGNALED_BIT and only after 
>> that lock the fence->lock ? If so, can't we
>> move DMA_FENCE_FLAG_SIGNALED_BIT  check inside the locked section ? 
>> Because if in theory
>> we could call the cb with unlocked fence->lock (i.e. this kind of 
>> iteration 
>> https://elixir.bootlin.com/linux/v5.15-rc6/source/drivers/gpu/drm/ttm/ttm_resource.c#L117)
>> we wouldn't have the lockdep splat. And in general, is it really
>> the correct approach to call a third party code from a call back with 
>> locked spinlock ? We don't know what the cb does inside
>> and I don't see any explicit restrictions in documentation of 
>> dma_fence_func_t what can and cannot be done there.
>
> Yeah, that's exactly what I meant with using the irq_work directly in 
> the fence code.

My idea is not to use irq work at all but instead to implement unlocked 
dma_fence cb execution using iteration
which drops the spinlock each time next cb is executed and acquiring it 
again after (until cb_list is empy).

>
>
> The problem is dma_fence_signal_locked() which is used by quite a 
> number of drivers to signal the fence while holding the lock.

For this I think we should not reuse dma_fence_signal_locked inside 
dma_fence_signal and instead implement it using the
unlocked iteration I mentioned above. I looked a bit in the code and the 
history and I see that until some time ago
(this commit by Chris 0fc89b6802ba1fcc561b0c906e0cefd384e3b2e5), indeed 
dma_fence_signal was doing it's own, locked iteration
and wasn't reusing dma_fence_signal_locked. This way whoever relies on 
the dma_fence_signal_locked won't be impacted
an who is not (like us in 
drm_sched_fence_scheduled/drm_sched_fence_finished) should also not be 
impacted by more narrow
scope of the lock. I also looked at dma_fence_default_wait and how it 
locks the fence->lock and check if fence is signaled
before wait start and I don't see a problem there either.

I attached quick draft of this proposal to clarify.

Andrey

>
> Otherwise we could indeed simplify the fence handling a lot.
>
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>> Am 01.10.21 um 17:10 schrieb Andrey Grodzovsky:
>>>> From what I see here you supposed to have actual deadlock and not 
>>>> only warning, sched_fence->finished is  first signaled from within
>>>> hw fence done callback (drm_sched_job_done_cb) but then again from 
>>>> within it's own callback (drm_sched_entity_kill_jobs_cb) and so
>>>> looks like same fence  object is recursively signaled twice. This 
>>>> leads to attempt to lock fence->lock second time while it's already
>>>> locked. I don't see a need to call drm_sched_fence_finished from 
>>>> within drm_sched_entity_kill_jobs_cb as this callback already 
>>>> registered
>>>> on sched_fence->finished fence (entity->last_scheduled == 
>>>> s_fence->finished) and hence the signaling already took place.
>>>>
>>>> Andrey
>>>>
>>>> On 2021-10-01 6:50 a.m., Christian König wrote:
>>>>> Hey, Andrey.
>>>>>
>>>>> while investigating some memory management problems I've got the 
>>>>> logdep splat below.
>>>>>
>>>>> Looks like something is wrong with 
>>>>> drm_sched_entity_kill_jobs_cb(), can you investigate?
>>>>>
>>>>> Thanks,
>>>>> Christian.
>>>>>
>>>>> [11176.741052] ============================================
>>>>> [11176.741056] WARNING: possible recursive locking detected
>>>>> [11176.741060] 5.15.0-rc1-00031-g9d546d600800 #171 Not tainted
>>>>> [11176.741066] --------------------------------------------
>>>>> [11176.741070] swapper/12/0 is trying to acquire lock:
>>>>> [11176.741074] ffff9c337ed175a8 (&fence->lock){-.-.}-{3:3}, at: 
>>>>> dma_fence_signal+0x28/0x80
>>>>> [11176.741088]
>>>>>                but task is already holding lock:
>>>>> [11176.741092] ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at: 
>>>>> dma_fence_signal+0x28/0x80
>>>>> [11176.741100]
>>>>>                other info that might help us debug this:
>>>>> [11176.741104]  Possible unsafe locking scenario:
>>>>>
>>>>> [11176.741108]        CPU0
>>>>> [11176.741110]        ----
>>>>> [11176.741113]   lock(&fence->lock);
>>>>> [11176.741118]   lock(&fence->lock);
>>>>> [11176.741122]
>>>>>                 *** DEADLOCK ***
>>>>>
>>>>> [11176.741125]  May be due to missing lock nesting notation
>>>>>
>>>>> [11176.741128] 2 locks held by swapper/12/0:
>>>>> [11176.741133]  #0: ffff9c339c30f768 
>>>>> (&ring->fence_drv.lock){-.-.}-{3:3}, at: dma_fence_signal+0x28/0x80
>>>>> [11176.741142]  #1: ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, 
>>>>> at: dma_fence_signal+0x28/0x80
>>>>> [11176.741151]
>>>>>                stack backtrace:
>>>>> [11176.741155] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 
>>>>> 5.15.0-rc1-00031-g9d546d600800 #171
>>>>> [11176.741160] Hardware name: System manufacturer System Product 
>>>>> Name/PRIME X399-A, BIOS 0808 10/12/2018
>>>>> [11176.741165] Call Trace:
>>>>> [11176.741169]  <IRQ>
>>>>> [11176.741173]  dump_stack_lvl+0x5b/0x74
>>>>> [11176.741181]  dump_stack+0x10/0x12
>>>>> [11176.741186]  __lock_acquire.cold+0x208/0x2df
>>>>> [11176.741197]  lock_acquire+0xc6/0x2d0
>>>>> [11176.741204]  ? dma_fence_signal+0x28/0x80
>>>>> [11176.741212]  _raw_spin_lock_irqsave+0x4d/0x70
>>>>> [11176.741219]  ? dma_fence_signal+0x28/0x80
>>>>> [11176.741225]  dma_fence_signal+0x28/0x80
>>>>> [11176.741230]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>>>> [11176.741240]  drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched]
>>>>> [11176.741248] dma_fence_signal_timestamp_locked+0xac/0x1a0
>>>>> [11176.741254]  dma_fence_signal+0x3b/0x80
>>>>> [11176.741260]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>>>> [11176.741268]  drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
>>>>> [11176.741277]  drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
>>>>> [11176.741284] dma_fence_signal_timestamp_locked+0xac/0x1a0
>>>>> [11176.741290]  dma_fence_signal+0x3b/0x80
>>>>> [11176.741296]  amdgpu_fence_process+0xd1/0x140 [amdgpu]
>>>>> [11176.741504]  sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
>>>>> [11176.741731]  amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
>>>>> [11176.741954]  amdgpu_ih_process+0x81/0x100 [amdgpu]
>>>>> [11176.742174]  amdgpu_irq_handler+0x26/0xa0 [amdgpu]
>>>>> [11176.742393]  __handle_irq_event_percpu+0x4f/0x2c0
>>>>> [11176.742402]  handle_irq_event_percpu+0x33/0x80
>>>>> [11176.742408]  handle_irq_event+0x39/0x60
>>>>> [11176.742414]  handle_edge_irq+0x93/0x1d0
>>>>> [11176.742419]  __common_interrupt+0x50/0xe0
>>>>> [11176.742426]  common_interrupt+0x80/0x90
>>>>> [11176.742431]  </IRQ>
>>>>> [11176.742436]  asm_common_interrupt+0x1e/0x40
>>>>> [11176.742442] RIP: 0010:cpuidle_enter_state+0xff/0x470
>>>>> [11176.742449] Code: 0f a3 05 04 54 24 01 0f 82 70 02 00 00 31 ff 
>>>>> e8 37 5d 6f ff 80 7d d7 00 0f 85 e9 01 00 00 e8 58 a2 7f ff fb 66 
>>>>> 0f 1f 44 00 00 <45> 85 ff 0f 88 01 01 00 00 49 63 c7 4c 2b 75 c8 
>>>>> 48 8d 14 40 48 8d
>>>>> [11176.742455] RSP: 0018:ffffb6970021fe48 EFLAGS: 00000202
>>>>> [11176.742461] RAX: 000000000059be25 RBX: 0000000000000002 RCX: 
>>>>> 0000000000000000
>>>>> [11176.742465] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>>>>> ffffffff9efeed78
>>>>> [11176.742470] RBP: ffffb6970021fe80 R08: 0000000000000001 R09: 
>>>>> 0000000000000001
>>>>> [11176.742473] R10: 0000000000000001 R11: 0000000000000001 R12: 
>>>>> ffff9c3350b0e800
>>>>> [11176.742477] R13: ffffffffa00e9680 R14: 00000a2a49ada060 R15: 
>>>>> 0000000000000002
>>>>> [11176.742483]  ? cpuidle_enter_state+0xf8/0x470
>>>>> [11176.742489]  ? cpuidle_enter_state+0xf8/0x470
>>>>> [11176.742495]  cpuidle_enter+0x2e/0x40
>>>>> [11176.742500]  call_cpuidle+0x23/0x40
>>>>> [11176.742506]  do_idle+0x201/0x280
>>>>> [11176.742512]  cpu_startup_entry+0x20/0x30
>>>>> [11176.742517]  start_secondary+0x11f/0x160
>>>>> [11176.742523]  secondary_startup_64_no_verify+0xb0/0xbb
>>>>>
>>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: draft.path
Type: text/x-systemd-unit
Size: 901 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20211025/ae8674d2/attachment.bin>