[PATCH] drm/amdgpu: Fix a BUG_ON due to resv trylock fails
xinhui pan
xinhui.pan at amd.com
Sat May 22 02:11:14 UTC 2021
The reservation object might be locked again by evict/swap after
individualized. The race is like below.
cpu 0 cpu 1
BO release BO evict or swap
lock lru_lock
ttm_bo_individualize_resv {resv = &_resv}
ttm_bo_evict_swapout_allowable
dma_resv_trylock(resv)
->release_notify() {BUG_ON(!trylock(resv))}
if (!ttm_bo_get_unless_zero))
dma_resv_unlock(resv)
unlock lru_lock
To fix it simply, let's acquire lru_lock before resv trylock to avoid
the race above.
Signed-off-by: xinhui pan <xinhui.pan at amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 928e8d57cd08..8f6da0034db9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -318,7 +318,9 @@ int amdgpu_amdkfd_remove_fence_on_pt_pd_bos(struct amdgpu_bo *bo)
ef = container_of(dma_fence_get(&info->eviction_fence->base),
struct amdgpu_amdkfd_fence, base);
+ spin_lock(&bo->tbo.bdev->lru_lock);
BUG_ON(!dma_resv_trylock(bo->tbo.base.resv));
+ spin_unlock(&bo->tbo.bdev->lru_lock);
ret = amdgpu_amdkfd_remove_eviction_fence(bo, ef);
dma_resv_unlock(bo->tbo.base.resv);
--
2.25.1
More information about the amd-gfx
mailing list