[PATCH] drm/amdgpu: add delay before poison_handler
Ellen Pan
yunpa at amd.com
Mon Jul 21 16:37:30 UTC 2025
When a poison is consumed on the guest before
the guest receives the host's poison creation msg,
a corner case may occur to have poison_handler complete
processing earlier than it should to cause the guest to
hang waiting for the req_bad_pages reply during
a VF FLR, resulting in the VM becoming inaccessible
in stress tests.
To work around this issue, this patch introduce
a delay of 3s before poison_handler msg gets sent out.
This way we make sure the correct processing
order for both poison_handler and req_bad_pages event.
Signed-off-by: Ellen Pan <yunpa at amd.com>
---
drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index f6d8597452ed..64e631c996e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -499,6 +499,9 @@ void xgpu_nv_mailbox_put_irq(struct amdgpu_device *adev)
static void xgpu_nv_ras_poison_handler(struct amdgpu_device *adev,
enum amdgpu_ras_block block)
{
+ // delay 3s to make sure any other intr is properly handled first
+ msleep(3000);
+
if (amdgpu_ip_version(adev, UMC_HWIP, 0) < IP_VERSION(12, 0, 0)) {
xgpu_nv_send_access_requests(adev, IDH_RAS_POISON);
} else {
--
2.25.1
More information about the amd-gfx
mailing list