[PATCH 1/4] drm/amdgpu: Add Runtime Bad Page message definitions for VFs

Gande, Shravan kumar Shravankumar.Gande at amd.com
Mon May 5 19:51:27 UTC 2025


[AMD Official Use Only - AMD Internal Distribution Only]

Looks good.

Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande at amd.com>

Thanks,
Shravan

-----Original Message-----
From: Pan, Ellen <Yunru.Pan at amd.com>
Sent: Tuesday, April 29, 2025 5:40 PM
To: amd-gfx at lists.freedesktop.org
Cc: Skvortsov, Victor <Victor.Skvortsov at amd.com>; Rehman, Ahmad <Ahmad.Rehman at amd.com>; Chan, Hing Pong <Jeffrey.Chan at amd.com>; Gande, Shravan kumar <Shravankumar.Gande at amd.com>; Luo, Zhigang <Zhigang.Luo at amd.com>; Pan, Ellen <Yunru.Pan at amd.com>; Skvortsov, Victor <Victor.Skvortsov at amd.com>
Subject: [PATCH 1/4] drm/amdgpu: Add Runtime Bad Page message definitions for VFs

Currently VFs rely on poison consumption interrupt from HW to kick off the bad page retirement process. Part of this process includes a VF reset.

This patch adds the following:

1) Host Bad Pages notification message.
2) Guest request bad pages message.

When combined, VFs are able to reserve the pages early, and potentially avoid future poison consumption that will disrupt user services from consequent FLR.

Signed-off-by: Victor Skvortsov <victor.skvortsov at amd.com>
Signed-off-by: Ellen Pan <yunru.pan at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
index bea724981309..3b0c55f67fe4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
@@ -331,6 +331,7 @@ enum amd_sriov_mailbox_request_message {
        MB_REQ_MSG_RAS_POISON = 202,
        MB_REQ_RAS_ERROR_COUNT = 203,
        MB_REQ_RAS_CPER_DUMP = 204,
+       MB_REQ_RAS_BAD_PAGES = 205,
 };

 /* mailbox message send from host to guest  */ @@ -348,6 +349,8 @@ enum amd_sriov_mailbox_response_message {
        MB_RES_MSG_GPU_RMA                      = 10,
        MB_RES_MSG_RAS_ERROR_COUNT_READY        = 11,
        MB_REQ_RAS_CPER_DUMP_READY              = 14,
+       MB_RES_MSG_RAS_BAD_PAGES_READY          = 15,
+       MB_RES_MSG_RAS_BAD_PAGES_NOTIFICATION   = 16,
        MB_RES_MSG_TEXT_MESSAGE                 = 255
 };

--
2.34.1



More information about the amd-gfx mailing list