[PATCH 1/2] drm/amdgpu: Implement instance ID remapping for harvested SDMA engines
Zhang, Jesse(Jie)
Jesse.Zhang at amd.com
Wed Jun 11 07:11:30 UTC 2025
[AMD Official Use Only - AMD Internal Distribution Only]
Thanks Lijo
As we discussed offline, we will remove the harvest_config check.
Regards
Jesse
-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar at amd.com>
Sent: Wednesday, June 11, 2025 2:15 PM
To: Zhang, Jesse(Jie) <Jesse.Zhang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Kim, Jonathan <Jonathan.Kim at amd.com>; Zhu, Jiadong <Jiadong.Zhu at amd.com>
Subject: Re: [PATCH 1/2] drm/amdgpu: Implement instance ID remapping for harvested SDMA engines
On 6/11/2025 11:26 AM, Jesse Zhang wrote:
> Adds logic to handle instance ID conversion during SDMA engine reset
> when harvest_config is active. This ensures correct physical engine
> addressing when some SDMA instances are harvested.
>
> Changes include:
> 1. Added instance ID remapping using GET_INST macro when harvest_config
> is non-zero
> 2. Conversion happens before engine reset procedure begins 3.
> Maintains existing reset flow for non-harvested configurations
>
> This fixes hardware initialization issues on devices with harvested
> SDMA instances where the logical instance IDs don't match physical
> hardware mapping.
>
This shouldn't be required. Without harvest-awareness, driver won't load properly on MI308.
Thanks,
Lijo
> Suggested-by: Jonathan Kim <jonathan.kim at amd.com>
> Signed-off-by: Jesse Zhang <Jesse.Zhang at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 3 +++
> drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 1 +
> 3 files changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index a0e9bf9b2710..4282f60a0cef 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -759,6 +759,7 @@ static void amdgpu_discovery_read_from_harvest_table(struct amdgpu_device *adev,
> ~(1U << harvest_info->list[i].number_instance);
> break;
> case SDMA0_HWID:
> + adev->sdma.harvest_config |= (1U <<
> +harvest_info->list[i].number_instance);
> adev->sdma.sdma_mask &=
> ~(1U << harvest_info->list[i].number_instance);
> break;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> index 6716ac281c49..0bfd2c138d24 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> @@ -581,6 +581,9 @@ int amdgpu_sdma_reset_engine(struct amdgpu_device *adev, uint32_t instance_id)
> bool gfx_sched_stopped = false, page_sched_stopped = false;
>
> mutex_lock(&sdma_instance->engine_reset_mutex);
> +
> + if (adev->sdma.harvest_config)
> + instance_id = GET_INST(SDMA0, instance_id);
> /* Stop the scheduler's work queue for the GFX and page rings if they are running.
> * This ensures that no new tasks are submitted to the queues while
> * the reset is in progress.
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
> index e5f8951bbb6f..fed00854a1a2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
> @@ -123,6 +123,7 @@ struct amdgpu_sdma {
>
> int num_instances;
> uint32_t sdma_mask;
> + uint32_t harvest_config;
> int num_inst_per_aid;
> uint32_t srbm_soft_reset;
> bool has_page_queue;
More information about the amd-gfx
mailing list