Lockdep spalt on killing a processes
Andrey Grodzovsky
andrey.grodzovsky at amd.com
Wed Oct 27 14:27:37 UTC 2021
On 2021-10-26 6:54 a.m., Christian König wrote:
> Am 26.10.21 um 04:33 schrieb Andrey Grodzovsky:
>> On 2021-10-25 3:56 p.m., Christian König wrote:
>>
>>> In general I'm all there to get this fixed, but there is one major
>>> problem: Drivers don't expect the lock to be dropped.
>>
>>
>> I am probably missing something but in my approach we only modify the
>> code for those clients that call dma_fence_signal,
>> not dma_fence_signal_locked. In those cases the drivers are agnostic
>> to lock behavior (or should be at least) since the lock
>> is acquired within the dma fence code. Note that if you are worried
>> about calling the callback without lock then same exact
>> concern is relevant to using the irq_work directly in the fence code
>> since the irq_work will execute at a later time without locked
>> fence->lock (which is the point of using irq_work).
>
> Yeah, I've seen that it just doesn't make much sense to me.
Not clear what doesn't make sense ?
>
>>>
>>> What we could do is to change all drivers so they call always call
>>> the dma_fence_signal functions and drop the _locked variants. This
>>> way we could move calling the callback out of the spinlock.
>>>
>>> But that requires audit of all drivers, so quite a lot of work to do.
>>
>>
>> As i said earlier - if we only modify dma_fence_signal and don't
>> touch dma_fence_signal_locked then our only concern should the users
>> of dma_fence_signal.
>
> Yes, but what do you do with the drivers who call the _locked variant?
IMHO we don't touch them at all, they stay as is, we only re-implement
dma_fence_signal because drivers that use the locked variant take the
lock explicitly and so they intend for their callbacks to run
under the lock and if they don't , it's their own problem within their
code and they should fix it locally there. Driver that call
dma_fence_signal are our only problem because they didn't take the lock
explicitly but were forced to run the callback under lock by the
dma-fence framework and that something we can change.
>
>> Let me please know if I am still missing some point of yours.
>
> Well, I mean we need to be able to handle this for all drivers.
For sure, but as i said above in my opinion we need to change only for
those drivers that don't use the _locked version.
Andrey
>
> Regards,
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 25.10.21 um 21:10 schrieb Andrey Grodzovsky:
>>>> Adding back Daniel (somehow he got off the addresses list) and
>>>> Chris who worked a lot in this area.
>>>>
>>>> On 2021-10-21 2:34 a.m., Christian König wrote:
>>>>>
>>>>>
>>>>> Am 20.10.21 um 21:32 schrieb Andrey Grodzovsky:
>>>>>> On 2021-10-04 4:14 a.m., Christian König wrote:
>>>>>>
>>>>>>> The problem is a bit different.
>>>>>>>
>>>>>>> The callback is on the dependent fence, while we need to signal
>>>>>>> the scheduler fence.
>>>>>>>
>>>>>>> Daniel is right that this needs an irq_work struct to handle
>>>>>>> this properly.
>>>>>>>
>>>>>>> Christian.
>>>>>>
>>>>>>
>>>>>> So we had some discussions with Christian regarding irq_work and
>>>>>> agreed I should look into doing it but stepping back for a sec -
>>>>>>
>>>>>> Why we insist on calling the dma_fence_cb with fence->lock
>>>>>> locked ? Is it because of dma_fence_add_callback ?
>>>>>> Because we first test for DMA_FENCE_FLAG_SIGNALED_BIT and only
>>>>>> after that lock the fence->lock ? If so, can't we
>>>>>> move DMA_FENCE_FLAG_SIGNALED_BIT check inside the locked section
>>>>>> ? Because if in theory
>>>>>> we could call the cb with unlocked fence->lock (i.e. this kind of
>>>>>> iteration
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.15-rc6%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Fttm%2Fttm_resource.c%23L117&data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc8a4525f94c244bebbd208d997f19242%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637707886075917091%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=YBq%2BNkDuYKERc8XJDWTfeM%2FSknpuCBHQYgN8Uo5PFv0%3D&reserved=0)
>>>>>> we wouldn't have the lockdep splat. And in general, is it really
>>>>>> the correct approach to call a third party code from a call back
>>>>>> with locked spinlock ? We don't know what the cb does inside
>>>>>> and I don't see any explicit restrictions in documentation of
>>>>>> dma_fence_func_t what can and cannot be done there.
>>>>>
>>>>> Yeah, that's exactly what I meant with using the irq_work directly
>>>>> in the fence code.
>>>>
>>>>
>>>> My idea is not to use irq work at all but instead to implement
>>>> unlocked dma_fence cb execution using iteration
>>>> which drops the spinlock each time next cb is executed and
>>>> acquiring it again after (until cb_list is empy).
>>>>
>>>>
>>>>>
>>>>>
>>>>> The problem is dma_fence_signal_locked() which is used by quite a
>>>>> number of drivers to signal the fence while holding the lock.
>>>>
>>>>
>>>> For this I think we should not reuse dma_fence_signal_locked inside
>>>> dma_fence_signal and instead implement it using the
>>>> unlocked iteration I mentioned above. I looked a bit in the code
>>>> and the history and I see that until some time ago
>>>> (this commit by Chris 0fc89b6802ba1fcc561b0c906e0cefd384e3b2e5),
>>>> indeed dma_fence_signal was doing it's own, locked iteration
>>>> and wasn't reusing dma_fence_signal_locked. This way whoever relies
>>>> on the dma_fence_signal_locked won't be impacted
>>>> an who is not (like us in
>>>> drm_sched_fence_scheduled/drm_sched_fence_finished) should also not
>>>> be impacted by more narrow
>>>> scope of the lock. I also looked at dma_fence_default_wait and how
>>>> it locks the fence->lock and check if fence is signaled
>>>> before wait start and I don't see a problem there either.
>>>>
>>>> I attached quick draft of this proposal to clarify.
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>> Otherwise we could indeed simplify the fence handling a lot.
>>>>>
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Am 01.10.21 um 17:10 schrieb Andrey Grodzovsky:
>>>>>>>> From what I see here you supposed to have actual deadlock and
>>>>>>>> not only warning, sched_fence->finished is first signaled from
>>>>>>>> within
>>>>>>>> hw fence done callback (drm_sched_job_done_cb) but then again
>>>>>>>> from within it's own callback (drm_sched_entity_kill_jobs_cb)
>>>>>>>> and so
>>>>>>>> looks like same fence object is recursively signaled twice.
>>>>>>>> This leads to attempt to lock fence->lock second time while
>>>>>>>> it's already
>>>>>>>> locked. I don't see a need to call drm_sched_fence_finished
>>>>>>>> from within drm_sched_entity_kill_jobs_cb as this callback
>>>>>>>> already registered
>>>>>>>> on sched_fence->finished fence (entity->last_scheduled ==
>>>>>>>> s_fence->finished) and hence the signaling already took place.
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>> On 2021-10-01 6:50 a.m., Christian König wrote:
>>>>>>>>> Hey, Andrey.
>>>>>>>>>
>>>>>>>>> while investigating some memory management problems I've got
>>>>>>>>> the logdep splat below.
>>>>>>>>>
>>>>>>>>> Looks like something is wrong with
>>>>>>>>> drm_sched_entity_kill_jobs_cb(), can you investigate?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> [11176.741052] ============================================
>>>>>>>>> [11176.741056] WARNING: possible recursive locking detected
>>>>>>>>> [11176.741060] 5.15.0-rc1-00031-g9d546d600800 #171 Not tainted
>>>>>>>>> [11176.741066] --------------------------------------------
>>>>>>>>> [11176.741070] swapper/12/0 is trying to acquire lock:
>>>>>>>>> [11176.741074] ffff9c337ed175a8 (&fence->lock){-.-.}-{3:3},
>>>>>>>>> at: dma_fence_signal+0x28/0x80
>>>>>>>>> [11176.741088]
>>>>>>>>> but task is already holding lock:
>>>>>>>>> [11176.741092] ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3},
>>>>>>>>> at: dma_fence_signal+0x28/0x80
>>>>>>>>> [11176.741100]
>>>>>>>>> other info that might help us debug this:
>>>>>>>>> [11176.741104] Possible unsafe locking scenario:
>>>>>>>>>
>>>>>>>>> [11176.741108] CPU0
>>>>>>>>> [11176.741110] ----
>>>>>>>>> [11176.741113] lock(&fence->lock);
>>>>>>>>> [11176.741118] lock(&fence->lock);
>>>>>>>>> [11176.741122]
>>>>>>>>> *** DEADLOCK ***
>>>>>>>>>
>>>>>>>>> [11176.741125] May be due to missing lock nesting notation
>>>>>>>>>
>>>>>>>>> [11176.741128] 2 locks held by swapper/12/0:
>>>>>>>>> [11176.741133] #0: ffff9c339c30f768
>>>>>>>>> (&ring->fence_drv.lock){-.-.}-{3:3}, at:
>>>>>>>>> dma_fence_signal+0x28/0x80
>>>>>>>>> [11176.741142] #1: ffff9c337ed172a8
>>>>>>>>> (&fence->lock){-.-.}-{3:3}, at: dma_fence_signal+0x28/0x80
>>>>>>>>> [11176.741151]
>>>>>>>>> stack backtrace:
>>>>>>>>> [11176.741155] CPU: 12 PID: 0 Comm: swapper/12 Not tainted
>>>>>>>>> 5.15.0-rc1-00031-g9d546d600800 #171
>>>>>>>>> [11176.741160] Hardware name: System manufacturer System
>>>>>>>>> Product Name/PRIME X399-A, BIOS 0808 10/12/2018
>>>>>>>>> [11176.741165] Call Trace:
>>>>>>>>> [11176.741169] <IRQ>
>>>>>>>>> [11176.741173] dump_stack_lvl+0x5b/0x74
>>>>>>>>> [11176.741181] dump_stack+0x10/0x12
>>>>>>>>> [11176.741186] __lock_acquire.cold+0x208/0x2df
>>>>>>>>> [11176.741197] lock_acquire+0xc6/0x2d0
>>>>>>>>> [11176.741204] ? dma_fence_signal+0x28/0x80
>>>>>>>>> [11176.741212] _raw_spin_lock_irqsave+0x4d/0x70
>>>>>>>>> [11176.741219] ? dma_fence_signal+0x28/0x80
>>>>>>>>> [11176.741225] dma_fence_signal+0x28/0x80
>>>>>>>>> [11176.741230] drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>>>>>>>> [11176.741240] drm_sched_entity_kill_jobs_cb+0x1c/0x50
>>>>>>>>> [gpu_sched]
>>>>>>>>> [11176.741248] dma_fence_signal_timestamp_locked+0xac/0x1a0
>>>>>>>>> [11176.741254] dma_fence_signal+0x3b/0x80
>>>>>>>>> [11176.741260] drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>>>>>>>> [11176.741268] drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
>>>>>>>>> [11176.741277] drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
>>>>>>>>> [11176.741284] dma_fence_signal_timestamp_locked+0xac/0x1a0
>>>>>>>>> [11176.741290] dma_fence_signal+0x3b/0x80
>>>>>>>>> [11176.741296] amdgpu_fence_process+0xd1/0x140 [amdgpu]
>>>>>>>>> [11176.741504] sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
>>>>>>>>> [11176.741731] amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
>>>>>>>>> [11176.741954] amdgpu_ih_process+0x81/0x100 [amdgpu]
>>>>>>>>> [11176.742174] amdgpu_irq_handler+0x26/0xa0 [amdgpu]
>>>>>>>>> [11176.742393] __handle_irq_event_percpu+0x4f/0x2c0
>>>>>>>>> [11176.742402] handle_irq_event_percpu+0x33/0x80
>>>>>>>>> [11176.742408] handle_irq_event+0x39/0x60
>>>>>>>>> [11176.742414] handle_edge_irq+0x93/0x1d0
>>>>>>>>> [11176.742419] __common_interrupt+0x50/0xe0
>>>>>>>>> [11176.742426] common_interrupt+0x80/0x90
>>>>>>>>> [11176.742431] </IRQ>
>>>>>>>>> [11176.742436] asm_common_interrupt+0x1e/0x40
>>>>>>>>> [11176.742442] RIP: 0010:cpuidle_enter_state+0xff/0x470
>>>>>>>>> [11176.742449] Code: 0f a3 05 04 54 24 01 0f 82 70 02 00 00 31
>>>>>>>>> ff e8 37 5d 6f ff 80 7d d7 00 0f 85 e9 01 00 00 e8 58 a2 7f ff
>>>>>>>>> fb 66 0f 1f 44 00 00 <45> 85 ff 0f 88 01 01 00 00 49 63 c7 4c
>>>>>>>>> 2b 75 c8 48 8d 14 40 48 8d
>>>>>>>>> [11176.742455] RSP: 0018:ffffb6970021fe48 EFLAGS: 00000202
>>>>>>>>> [11176.742461] RAX: 000000000059be25 RBX: 0000000000000002
>>>>>>>>> RCX: 0000000000000000
>>>>>>>>> [11176.742465] RDX: 0000000000000000 RSI: 0000000000000000
>>>>>>>>> RDI: ffffffff9efeed78
>>>>>>>>> [11176.742470] RBP: ffffb6970021fe80 R08: 0000000000000001
>>>>>>>>> R09: 0000000000000001
>>>>>>>>> [11176.742473] R10: 0000000000000001 R11: 0000000000000001
>>>>>>>>> R12: ffff9c3350b0e800
>>>>>>>>> [11176.742477] R13: ffffffffa00e9680 R14: 00000a2a49ada060
>>>>>>>>> R15: 0000000000000002
>>>>>>>>> [11176.742483] ? cpuidle_enter_state+0xf8/0x470
>>>>>>>>> [11176.742489] ? cpuidle_enter_state+0xf8/0x470
>>>>>>>>> [11176.742495] cpuidle_enter+0x2e/0x40
>>>>>>>>> [11176.742500] call_cpuidle+0x23/0x40
>>>>>>>>> [11176.742506] do_idle+0x201/0x280
>>>>>>>>> [11176.742512] cpu_startup_entry+0x20/0x30
>>>>>>>>> [11176.742517] start_secondary+0x11f/0x160
>>>>>>>>> [11176.742523] secondary_startup_64_no_verify+0xb0/0xbb
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
More information about the dri-devel
mailing list