[BUG] gpu: drm: radeon: two possible deadlocks involving locking and waiting

Jia-Ju Bai baijiaju1990 at gmail.com
Sat Feb 5 01:55:36 UTC 2022


Hi Christian,

Thanks for the reply :)

On 2022/2/1 15:56, Christian König wrote:
> Hi Jia-Ju,
>
> interesting that you have found those issues with an automated tool.
>
> And yes that is a well design flaw within the radeon driver which can 
> happen on hardware faults, e.g. when radeon_ring_backup() needs to be 
> called.

In fact, my tool finds dozens of similar possible deadlocks caused by 
wait_event_timeout() in radeon_fence_wait_seq_timeout().
There are three other examples in Linux 5.16:

#BUG 1
radeon_dpm_change_power_state_locked()
   mutex_lock(&rdev->ring_lock); --> Line 1133 (Lock A)
   radeon_fence_wait_empty()
     radeon_fence_wait_seq_timeout()
       wait_event_timeout(rdev->fence_queue, ...) --> Line 504 (Wait X)

radeon_fence_driver_fini()
   mutex_lock(&rdev->ring_lock); --> Line 917 (Lock A)
   wake_up_all(&rdev->fence_queue); --> Line 927 (Wake X)

#BUG 2
radeon_set_pm_profile()
   mutex_lock(&rdev->pm.mutex); --> Line 382 (Lock A)
   radeon_pm_set_clocks()
     radeon_fence_wait_empty()
       radeon_fence_wait_seq_timeout()
         wait_event_timeout(rdev->fence_queue, ...) --> Line 504 (Wait X)

radeon_dynpm_idle_work_handler()
   mutex_lock(&rdev->pm.mutex); --> Line 1861 (Lock A)
   radeon_fence_count_emitted()
     radeon_fence_process()
       wake_up_all(&rdev->fence_queue); --> Line 323 (Wake X)

#BUG 3
radeon_pm_fini_old()
   mutex_lock(&rdev->pm.mutex); --> Line 1642 (Lock A)
   radeon_pm_set_clocks()
     radeon_fence_wait_empty()
       radeon_fence_wait_seq_timeout()
         wait_event_timeout(rdev->fence_queue, ...) --> Line 504 (Wait X)

radeon_dynpm_idle_work_handler()
   mutex_lock(&rdev->pm.mutex); --> Line 1861 (Lock A)
   radeon_fence_count_emitted()
     radeon_fence_process()
       wake_up_all(&rdev->fence_queue); --> Line 323 (Wake X)

Thus, to fix these possible deadlocks, we could moditify the code 
related to radeon_fence_wait_seq_timeout().
But I am not quite familar with the radeon driver, so I am not sure how 
to moditify the code properly.

>
> But that happens so rarely and the driver is not developed further 
> that we decided to not address this any more.

Ah, okay.

>
> Regards,
> Christian.
>
> Am 01.02.22 um 08:40 schrieb Jia-Ju Bai:
>> Hello,
>>
>> My static analysis tool reports a possible deadlock in the radeon 
>> driver in Linux 5.16:
>>
>> #BUG 1
>> radeon_dpm_change_power_state_locked()
>>   mutex_lock(&rdev->ring_lock); --> Line 1133 (Lock A)
>>   radeon_fence_wait_empty()
>>     radeon_fence_wait_seq_timeout()
>>       wait_event_timeout(rdev->fence_queue, ...) --> Line 504 (Wait X)
>>
>> radeon_ring_backup()
>>   mutex_lock(&rdev->ring_lock); --> Line 289(Lock A)
>>   radeon_fence_count_emitted()
>>     radeon_fence_process()
>>       wake_up_all(&rdev->fence_queue); --> Line 323 (Wake X)
>>
>> When radeon_dpm_change_power_state_locked() is executed, "Wait X" is 
>> performed by holding "Lock A". If radeon_ring_backup() is executed at 
>> this time, "Wake X" cannot be performed to wake up "Wait X" in 
>> radeon_dpm_change_power_state_locked(), because "Lock A" has been 
>> already hold by radeon_dpm_change_power_state_locked(), causing a 
>> possible deadlock.
>> I find that "Wait X" is performed with a timeout 
>> MAX_SCHEDULE_TIMEOUT, to relieve the possible deadlock; but I think 
>> this timeout can cause inefficient execution.
>>
>> #BUG 2
>> radeon_ring_lock()
>>   mutex_lock(&rdev->ring_lock); --> Line 147 (Lock A)
>>   radeon_ring_alloc()
>>     radeon_fence_wait_next()
>>       radeon_fence_wait_seq_timeout()
>>         wait_event_timeout(rdev->fence_queue, ...) --> Line 504 (Wait X)
>>
>> radeon_ring_backup()
>>   mutex_lock(&rdev->ring_lock); --> Line 289(Lock A)
>>   radeon_fence_count_emitted()
>>     radeon_fence_process()
>>       wake_up_all(&rdev->fence_queue); --> Line 323 (Wake X)
>>
>> When radeon_ring_lock() is executed, "Wait X" is performed by holding 
>> "Lock A". If radeon_ring_backup() is executed at this time, "Wake X" 
>> cannot be performed to wake up "Wait X" in radeon_ring_lock(), 
>> because "Lock A" has been already hold by radeon_ring_lock(), causing 
>> a possible deadlock.
>> I find that "Wait X" is performed with a timeout 
>> MAX_SCHEDULE_TIMEOUT, to relieve the possible deadlock; but I think 
>> this timeout can cause inefficient execution.
>>
>> I am not quite sure whether these possible problems are real and how 
>> to fix them if they are real.
>> Any feedback would be appreciated, thanks :)
>>
>>
>> Best wishes,
>> Jia-Ju Bai
>>
>



More information about the amd-gfx mailing list