[PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

Christian König ckoenig.leichtzumerken at gmail.com
Thu Feb 14 09:05:36 UTC 2019


Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:
> When ring hang happens amdgpu_dm_commit_planes during flip is holding
> the BO reserved and then stack waiting for fences to signal in
> reservation_object_wait_timeout_rcu (which won't signal because there
> was a hnag). Then when we try to shutdown display block during reset
> recovery from drm_atomic_helper_suspend we also try to reserve the BO
> from dm_plane_helper_cleanup_fb ending in deadlock.
> Also remove useless WARN_ON

Well it is good that you pointed this out, but there are more problems 
than just waiting wile the BO is reserved here.

>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
> ---
>   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 +++++++++++++------
>   1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index acc4ff8..f8dec36 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>   			 */
>   			abo = gem_to_amdgpu_bo(fb->obj[0]);
>   			r = amdgpu_bo_reserve(abo, true);

Well why do we reserve the BO in the first place? As the name indicates 
reservation_object_wait_timeout_rcu() just uses rcu to wait for the BO 
to be idle, no need to actually reserve it at all.

> -			if (unlikely(r != 0)) {
> +			if (unlikely(r != 0))
>   				DRM_ERROR("failed to reserve buffer before flip\n");
> -				WARN_ON(1);
> -			}
>   
> -			/* Wait for all fences on this FB */
> -			WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false,
> -										    MAX_SCHEDULE_TIMEOUT) < 0);
> +			/*
> +			 * Wait for all fences on this FB. Do limited wait to avoid
> +			 * deadlock during GPU reset when this fence will not signal
> +			 * but we hold reservation lock for the BO.
> +			 */
> +			r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
> +								true, false,

Does this waiting happen in a work item or process context? If it's 
process context we should actually try to wait interruptible here.

Regards,
Christian.

> +								msecs_to_jiffies(5000));
> +			if (unlikely(r == 0))
> +				DRM_ERROR("Waiting for fences timed out.");
> +
> +
>   
>   			amdgpu_bo_get_tiling_flags(abo, &tiling_flags);
>   



More information about the amd-gfx mailing list