[PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

Grodzovsky, Andrey Andrey.Grodzovsky at amd.com
Thu Feb 14 15:42:43 UTC 2019


On 2/14/19 4:05 AM, Christian König wrote:
> Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:
>> When ring hang happens amdgpu_dm_commit_planes during flip is holding
>> the BO reserved and then stack waiting for fences to signal in
>> reservation_object_wait_timeout_rcu (which won't signal because there
>> was a hnag). Then when we try to shutdown display block during reset
>> recovery from drm_atomic_helper_suspend we also try to reserve the BO
>> from dm_plane_helper_cleanup_fb ending in deadlock.
>> Also remove useless WARN_ON
>
> Well it is good that you pointed this out, but there are more problems 
> than just waiting wile the BO is reserved here.
>
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>> ---
>>   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 
>> +++++++++++++------
>>   1 file changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> index acc4ff8..f8dec36 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct 
>> drm_atomic_state *state,
>>                */
>>               abo = gem_to_amdgpu_bo(fb->obj[0]);
>>               r = amdgpu_bo_reserve(abo, true);
>
> Well why do we reserve the BO in the first place? As the name 
> indicates reservation_object_wait_timeout_rcu() just uses rcu to wait 
> for the BO to be idle, no need to actually reserve it at all.

That a good point, I honestly can't remember any more why it's here... 
It does look unnecessary given that we are not validating the BO here.


>
>> -            if (unlikely(r != 0)) {
>> +            if (unlikely(r != 0))
>>                   DRM_ERROR("failed to reserve buffer before flip\n");
>> -                WARN_ON(1);
>> -            }
>>   -            /* Wait for all fences on this FB */
>> - WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, 
>> false,
>> - MAX_SCHEDULE_TIMEOUT) < 0);
>> +            /*
>> +             * Wait for all fences on this FB. Do limited wait to avoid
>> +             * deadlock during GPU reset when this fence will not 
>> signal
>> +             * but we hold reservation lock for the BO.
>> +             */
>> +            r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
>> +                                true, false,
>
> Does this waiting happen in a work item or process context? If it's 
> process context we should actually try to wait interruptible here.


Work item only.

Andrey


>
> Regards,
> Christian.
>
>> + msecs_to_jiffies(5000));
>> +            if (unlikely(r == 0))
>> +                DRM_ERROR("Waiting for fences timed out.");
>> +
>> +
>>                 amdgpu_bo_get_tiling_flags(abo, &tiling_flags);
>


More information about the amd-gfx mailing list