[PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

Grodzovsky, Andrey Andrey.Grodzovsky at amd.com
Thu Feb 14 16:01:40 UTC 2019


On 2/14/19 10:54 AM, Kazlauskas, Nicholas wrote:
> On 2/14/19 10:42 AM, Grodzovsky, Andrey wrote:
>> On 2/14/19 4:05 AM, Christian König wrote:
>>> Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:
>>>> When ring hang happens amdgpu_dm_commit_planes during flip is holding
>>>> the BO reserved and then stack waiting for fences to signal in
>>>> reservation_object_wait_timeout_rcu (which won't signal because there
>>>> was a hnag). Then when we try to shutdown display block during reset
>>>> recovery from drm_atomic_helper_suspend we also try to reserve the BO
>>>> from dm_plane_helper_cleanup_fb ending in deadlock.
>>>> Also remove useless WARN_ON
>>> Well it is good that you pointed this out, but there are more problems
>>> than just waiting wile the BO is reserved here.
>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>> ---
>>>>     drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19
>>>> +++++++++++++------
>>>>     1 file changed, 13 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>>> index acc4ff8..f8dec36 100644
>>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>>> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct
>>>> drm_atomic_state *state,
>>>>                  */
>>>>                 abo = gem_to_amdgpu_bo(fb->obj[0]);
>>>>                 r = amdgpu_bo_reserve(abo, true);
>>> Well why do we reserve the BO in the first place? As the name
>>> indicates reservation_object_wait_timeout_rcu() just uses rcu to wait
>>> for the BO to be idle, no need to actually reserve it at all.
>> That a good point, I honestly can't remember any more why it's here...
>> It does look unnecessary given that we are not validating the BO here.
> I think we reserve it to grab the tiling flags to update the plane
> address (and some other bits of the plane state with a patch I'm
> currently working on). I'm guessing that we still need the reservation
> so nothing else tries to grab it after it becomes idle, but I'm not
> overly familiar with that bit.
>
> Nicholas Kazlauskas

I don't think so, first of all we only hold the reservation during the 
wait for fences and then release it, secondly, BO reservation is a thin 
wrapper around ttm_bo_reserve which just protects the ttm BO internal 
structures and removes the BO from eviction list.

I think you can try to remove it and run some stress testing to see 
there is no visible corruption.

Andrey


>
>>
>>>> -            if (unlikely(r != 0)) {
>>>> +            if (unlikely(r != 0))
>>>>                     DRM_ERROR("failed to reserve buffer before flip\n");
>>>> -                WARN_ON(1);
>>>> -            }
>>>>     -            /* Wait for all fences on this FB */
>>>> - WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true,
>>>> false,
>>>> - MAX_SCHEDULE_TIMEOUT) < 0);
>>>> +            /*
>>>> +             * Wait for all fences on this FB. Do limited wait to avoid
>>>> +             * deadlock during GPU reset when this fence will not
>>>> signal
>>>> +             * but we hold reservation lock for the BO.
>>>> +             */
>>>> +            r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
>>>> +                                true, false,
>>> Does this waiting happen in a work item or process context? If it's
>>> process context we should actually try to wait interruptible here.
>>
>> Work item only.
>>
>> Andrey
>>
>>
>>> Regards,
>>> Christian.
>>>
>>>> + msecs_to_jiffies(5000));
>>>> +            if (unlikely(r == 0))
>>>> +                DRM_ERROR("Waiting for fences timed out.");
>>>> +
>>>> +
>>>>                   amdgpu_bo_get_tiling_flags(abo, &tiling_flags);


More information about the amd-gfx mailing list