[PATCH] drm/amdgpu: add delay before poison_handler

Ellen Pan yunpa at amd.com
Mon Jul 21 16:37:30 UTC 2025


When a poison is consumed on the guest before
the guest receives the host's poison creation msg,
a corner case may occur to have poison_handler complete
processing earlier than it should to cause the guest to
hang waiting for the req_bad_pages reply during
a VF FLR, resulting in the VM becoming inaccessible
in stress tests.

To work around this issue, this patch introduce
a delay of 3s before poison_handler msg gets sent out.
This way we make sure the correct processing
order for both poison_handler and req_bad_pages event.

Signed-off-by: Ellen Pan <yunpa at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index f6d8597452ed..64e631c996e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -499,6 +499,9 @@ void xgpu_nv_mailbox_put_irq(struct amdgpu_device *adev)
 static void xgpu_nv_ras_poison_handler(struct amdgpu_device *adev,
 		enum amdgpu_ras_block block)
 {
+	// delay 3s to make sure any other intr is properly handled first
+	msleep(3000);
+
 	if (amdgpu_ip_version(adev, UMC_HWIP, 0) < IP_VERSION(12, 0, 0)) {
 		xgpu_nv_send_access_requests(adev, IDH_RAS_POISON);
 	} else {
-- 
2.25.1



More information about the amd-gfx mailing list