[PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.
Grodzovsky, Andrey
Andrey.Grodzovsky at amd.com
Wed Feb 13 19:10:16 UTC 2019
On 2/13/19 2:00 PM, Kazlauskas, Nicholas wrote:
> On 2/13/19 1:58 PM, Andrey Grodzovsky wrote:
>> When ring hang happens amdgpu_dm_commit_planes during flip is holding
>> the BO reserved and then stack waiting for fences to signal in
>> reservation_object_wait_timeout_rcu (which won't signal because there
>> was a hnag). Then when we try to shutdown display block during reset
>> recovery from drm_atomic_helper_suspend we also try to reserve the BO
>> from dm_plane_helper_cleanup_fb ending in deadlock.
>> Also remove useless WARN_ON
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>> ---
>> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 +++++++++++++------
>> 1 file changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> index acc4ff8..f8dec36 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>> */
>> abo = gem_to_amdgpu_bo(fb->obj[0]);
>> r = amdgpu_bo_reserve(abo, true);
>> - if (unlikely(r != 0)) {
>> + if (unlikely(r != 0))
>> DRM_ERROR("failed to reserve buffer before flip\n");
>> - WARN_ON(1);
>> - }
>>
>> - /* Wait for all fences on this FB */
>> - WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false,
>> - MAX_SCHEDULE_TIMEOUT) < 0);
>> + /*
>> + * Wait for all fences on this FB. Do limited wait to avoid
>> + * deadlock during GPU reset when this fence will not signal
>> + * but we hold reservation lock for the BO.
>> + */
>> + r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
>> + true, false,
>> + msecs_to_jiffies(5000));
>> + if (unlikely(r == 0))
>> + DRM_ERROR("Waiting for fences timed out.");
>> +
>> +
>>
>> amdgpu_bo_get_tiling_flags(abo, &tiling_flags);
>>
>>
> Is it safe that we're just continuing like this? It's probably better to
> just unreserve the buffer and continue to the next plane, no?
>
> Nicholas Kazlauskas
As far as I see it should be safe as you are simply flipping to a buffer
for which rendering hasn't finished (or stack actually in this case) so
you might see visual corruption but that the least of your problems if
after 5s the BO still not finalized for presentation, the system is
already probably in very bad shape. Also, in case we do want to do
error handling we should also take care of amdgpu_bo_reserve failure
just before that.
Andrey
More information about the amd-gfx
mailing list