[PATCH] drm/amdgpu: Fix a BUG_ON due to resv trylock fails
Felix Kuehling
felix.kuehling at amd.com
Sat May 22 02:57:03 UTC 2021
When the BO gets individualized, there is an assumption that nobody is
accessing it any more. See this comment in ttm_bo_individualize_resv:
/* This works because the BO is about to be destroyed and nobody
* reference it any more. The only tricky case is the trylock on
* the resv object while holding the lru_lock.
*/
That is violated when the BO is still being swapped out at this stage.
You can kind of paper that over by taking the LRU lock. But there are
probably other race conditions going on when the reservation gets
swapped by "individualize" during an eviction.
I think to avoid all that TTM needs to make sure that the BO is no
longer on the LRU list when it gets individualized.
Regards,
Felix
Am 2021-05-21 um 10:11 p.m. schrieb xinhui pan:
> The reservation object might be locked again by evict/swap after
> individualized. The race is like below.
> cpu 0 cpu 1
> BO release BO evict or swap
> lock lru_lock
> ttm_bo_individualize_resv {resv = &_resv}
> ttm_bo_evict_swapout_allowable
> dma_resv_trylock(resv)
> ->release_notify() {BUG_ON(!trylock(resv))}
> if (!ttm_bo_get_unless_zero))
> dma_resv_unlock(resv)
> unlock lru_lock
> To fix it simply, let's acquire lru_lock before resv trylock to avoid
> the race above.
>
> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 928e8d57cd08..8f6da0034db9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -318,7 +318,9 @@ int amdgpu_amdkfd_remove_fence_on_pt_pd_bos(struct amdgpu_bo *bo)
> ef = container_of(dma_fence_get(&info->eviction_fence->base),
> struct amdgpu_amdkfd_fence, base);
>
> + spin_lock(&bo->tbo.bdev->lru_lock);
> BUG_ON(!dma_resv_trylock(bo->tbo.base.resv));
> + spin_unlock(&bo->tbo.bdev->lru_lock);
> ret = amdgpu_amdkfd_remove_eviction_fence(bo, ef);
> dma_resv_unlock(bo->tbo.base.resv);
>
More information about the amd-gfx
mailing list