[PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

Kazlauskas, Nicholas Nicholas.Kazlauskas at amd.com
Thu Feb 14 15:54:43 UTC 2019


On 2/14/19 10:42 AM, Grodzovsky, Andrey wrote:
> 
> On 2/14/19 4:05 AM, Christian König wrote:
>> Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:
>>> When ring hang happens amdgpu_dm_commit_planes during flip is holding
>>> the BO reserved and then stack waiting for fences to signal in
>>> reservation_object_wait_timeout_rcu (which won't signal because there
>>> was a hnag). Then when we try to shutdown display block during reset
>>> recovery from drm_atomic_helper_suspend we also try to reserve the BO
>>> from dm_plane_helper_cleanup_fb ending in deadlock.
>>> Also remove useless WARN_ON
>>
>> Well it is good that you pointed this out, but there are more problems
>> than just waiting wile the BO is reserved here.
>>
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19
>>> +++++++++++++------
>>>    1 file changed, 13 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> index acc4ff8..f8dec36 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct
>>> drm_atomic_state *state,
>>>                 */
>>>                abo = gem_to_amdgpu_bo(fb->obj[0]);
>>>                r = amdgpu_bo_reserve(abo, true);
>>
>> Well why do we reserve the BO in the first place? As the name
>> indicates reservation_object_wait_timeout_rcu() just uses rcu to wait
>> for the BO to be idle, no need to actually reserve it at all.
> 
> That a good point, I honestly can't remember any more why it's here...
> It does look unnecessary given that we are not validating the BO here.

I think we reserve it to grab the tiling flags to update the plane 
address (and some other bits of the plane state with a patch I'm 
currently working on). I'm guessing that we still need the reservation 
so nothing else tries to grab it after it becomes idle, but I'm not 
overly familiar with that bit.

Nicholas Kazlauskas

> 
> 
>>
>>> -            if (unlikely(r != 0)) {
>>> +            if (unlikely(r != 0))
>>>                    DRM_ERROR("failed to reserve buffer before flip\n");
>>> -                WARN_ON(1);
>>> -            }
>>>    -            /* Wait for all fences on this FB */
>>> - WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true,
>>> false,
>>> - MAX_SCHEDULE_TIMEOUT) < 0);
>>> +            /*
>>> +             * Wait for all fences on this FB. Do limited wait to avoid
>>> +             * deadlock during GPU reset when this fence will not
>>> signal
>>> +             * but we hold reservation lock for the BO.
>>> +             */
>>> +            r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
>>> +                                true, false,
>>
>> Does this waiting happen in a work item or process context? If it's
>> process context we should actually try to wait interruptible here.
> 
> 
> Work item only.
> 
> Andrey
> 
> 
>>
>> Regards,
>> Christian.
>>
>>> + msecs_to_jiffies(5000));
>>> +            if (unlikely(r == 0))
>>> +                DRM_ERROR("Waiting for fences timed out.");
>>> +
>>> +
>>>                  amdgpu_bo_get_tiling_flags(abo, &tiling_flags);
>>



More information about the amd-gfx mailing list