[PATCH] drm/amdgpu: SRIOV flr_work should use down_write
Victor Skvortsov
victor.skvortsov at amd.com
Thu Dec 9 17:02:11 UTC 2021
Host initiated VF FLR may fail if someone else
is already holding a read_lock. Change from
down_write_trylock to down_write to guarantee
the reset goes through.
Signed-off-by: Victor Skvortsov <victor.skvortsov at amd.com>
---
drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 5 +++--
drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 5 +++--
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index cd2719bc0139..e4365c97adaa 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -252,11 +252,12 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work)
* otherwise the mailbox msg will be ruined/reseted by
* the VF FLR.
*/
- if (!down_write_trylock(&adev->reset_sem))
+ if (atomic_cmpxchg(&adev->in_gpu_reset, 0, 1) != 0)
return;
+ down_write(&adev->reset_sem);
+
amdgpu_virt_fini_vf2pf_work_item(adev);
- atomic_set(&adev->in_gpu_reset, 1);
xgpu_ai_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index 2bc93808469a..1cde70c72e54 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -281,11 +281,12 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work)
* otherwise the mailbox msg will be ruined/reseted by
* the VF FLR.
*/
- if (!down_write_trylock(&adev->reset_sem))
+ if (atomic_cmpxchg(&adev->in_gpu_reset, 0, 1) != 0)
return;
+ down_write(&adev->reset_sem);
+
amdgpu_virt_fini_vf2pf_work_item(adev);
- atomic_set(&adev->in_gpu_reset, 1);
xgpu_nv_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);
--
2.25.1
More information about the amd-gfx
mailing list