[PATCH 2/2] drm/amdgpu: Queue KFD reset workitem in VF FED
Victor Skvortsov
victor.skvortsov at amd.com
Sun May 19 14:52:21 UTC 2024
The guest recovery sequence is buggy in Fatal Error when both
FLR & KFD reset workitems are queued at the same time. In addition,
FLR guest recovery sequence is out of order when PF/VF communication
breaks due to a GPU fatal error
As a temporary work around, perform a KFD style reset (Initiate reset
request from the guest) inside the pf2vf thread on FED.
Signed-off-by: Victor Skvortsov <victor.skvortsov at amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index d98d619fba97..3d5f58e76f2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -602,7 +602,7 @@ static void amdgpu_virt_update_vf2pf_work_item(struct work_struct *work)
amdgpu_sriov_runtime(adev)) {
amdgpu_ras_set_fed(adev, true);
if (amdgpu_reset_domain_schedule(adev->reset_domain,
- &adev->virt.flr_work))
+ &adev->kfd.reset_work))
return;
else
dev_err(adev->dev, "Failed to queue work! at %s", __func__);
--
2.34.1
More information about the amd-gfx
mailing list