[PATCH v2] drm/amdgpu: Disallow partition query during reset

Zhang, Hawking Hawking.Zhang at amd.com
Wed Apr 16 08:45:45 UTC 2025


[AMD Official Use Only - AMD Internal Distribution Only]

Reviewed-by: Hawking Zhang <Hawking.Zhang at amd.com>

Regards,
Hawking
-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar at amd.com>
Sent: Wednesday, April 16, 2025 16:12
To: amd-gfx at lists.freedesktop.org
Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Kamal, Asad <Asad.Kamal at amd.com>
Subject: [PATCH v2] drm/amdgpu: Disallow partition query during reset

Reject queries to get current partition modes during reset. Also, don't accept sysfs interface requests to switch compute partition mode while in reset.

Signed-off-by: Lijo Lazar <lijo.lazar at amd.com>
---
v2: Keep consistent error code, return EPERM

 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 10 ++++++++++  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c |  4 ++++
 2 files changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 2c933d436e56..67ebeed77d71 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -1353,6 +1353,10 @@ static ssize_t amdgpu_gfx_get_current_compute_partition(struct device *dev,
        struct amdgpu_device *adev = drm_to_adev(ddev);
        int mode;

+       /* Only minimal precaution taken to reject requests while in reset.*/
+       if (amdgpu_in_reset(adev))
+               return -EPERM;
+
        mode = amdgpu_xcp_query_partition_mode(adev->xcp_mgr,
                                               AMDGPU_XCP_FL_NONE);

@@ -1396,8 +1400,14 @@ static ssize_t amdgpu_gfx_set_compute_partition(struct device *dev,
                return -EINVAL;
        }

+       /* Don't allow a switch while under reset */
+       if (!down_read_trylock(&adev->reset_domain->sem))
+               return -EPERM;
+
        ret = amdgpu_xcp_switch_partition_mode(adev->xcp_mgr, mode);

+       up_read(&adev->reset_domain->sem);
+
        if (ret)
                return ret;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index ecb74ccf1d90..6b0fbbb91e57 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -1230,6 +1230,10 @@ static ssize_t current_memory_partition_show(
        struct amdgpu_device *adev = drm_to_adev(ddev);
        enum amdgpu_memory_partition mode;

+       /* Only minimal precaution taken to reject requests while in reset */
+       if (amdgpu_in_reset(adev))
+               return -EPERM;
+
        mode = adev->gmc.gmc_funcs->query_mem_partition_mode(adev);
        if ((mode >= ARRAY_SIZE(nps_desc)) ||
            (BIT(mode) & AMDGPU_ALL_NPS_MASK) != BIT(mode))
--
2.25.1



More information about the amd-gfx mailing list