[PATCH v2 1/2] drm/amdgpu: fix a potential deadlock in amdgpu_bo_create_restricted()

Nicolai Hähnle nhaehnle at gmail.com
Mon Feb 13 16:25:08 UTC 2017


On 09.02.2017 11:33, Samuel Pitoiset wrote:
> When ttm_bo_init() fails, the reservation mutex should be unlocked.
>
> In debug build, the kernel reported "possible recursive locking
> detected" in this codepath. For debugging purposes, I also added
> a "WARN_ON(ww_mutex_is_locked())" when ttm_bo_init() fails and the
> mutex was locked as expected.
>
> This should fix (random) GPU hangs. The easy way to reproduce the
> issue is to change the "Super Sampling" option from 1.0 to 2.0 in
> Hitman. It will create a huge buffer, evict a bunch of buffers
> (around ~5k) and deadlock.
>
> This regression has been introduced pretty recently.
>
> v2: only release the mutex if resv is NULL
>
> Fixes: 12a852219583 ("drm/amdgpu: improve AMDGPU_GEM_CREATE_VRAM_CLEARED handling (v2)")
> Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index d1ef1d064de4..556236a112c1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -403,8 +403,11 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
>  			&bo->placement, page_align, !kernel, NULL,
>  			acc_size, sg, resv ? resv : &bo->tbo.ttm_resv,
>  			&amdgpu_ttm_bo_destroy);
> -	if (unlikely(r != 0))
> +	if (unlikely(r != 0)) {
> +		if (!resv)
> +			ww_mutex_unlock(&bo->tbo.resv->lock);
>  		return r;
> +	}

I was looking at this myself a couple of weeks back, and I'm pretty sure 
I had this exact same patch just to realize that it's actually incorrect.

The problem is that ttm_bo_init will actually call the destroy function 
(in our case, amdgpu_ttm_bo_destroy), so at this point, bo has been freed.

This code is a huge mess. I'm surprised though: have you verified that 
this patch actually fixes a hang?

Cheers,
Nicolai


>
>  	bo->tbo.priority = ilog2(bo->tbo.num_pages);
>  	if (kernel)
>



More information about the amd-gfx mailing list