[PATCH 1/2] drm/amdgpu: Implement instance ID remapping for harvested SDMA engines

Zhang, Jesse(Jie) Jesse.Zhang at amd.com
Wed Jun 11 07:11:30 UTC 2025


[AMD Official Use Only - AMD Internal Distribution Only]

Thanks Lijo
As we discussed offline, we will remove the harvest_config check.

Regards
Jesse

-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar at amd.com>
Sent: Wednesday, June 11, 2025 2:15 PM
To: Zhang, Jesse(Jie) <Jesse.Zhang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Kim, Jonathan <Jonathan.Kim at amd.com>; Zhu, Jiadong <Jiadong.Zhu at amd.com>
Subject: Re: [PATCH 1/2] drm/amdgpu: Implement instance ID remapping for harvested SDMA engines



On 6/11/2025 11:26 AM, Jesse Zhang wrote:
> Adds logic to handle instance ID conversion during SDMA engine reset
> when harvest_config is active. This ensures correct physical engine
> addressing when some SDMA instances are harvested.
>
> Changes include:
> 1. Added instance ID remapping using GET_INST macro when harvest_config
>    is non-zero
> 2. Conversion happens before engine reset procedure begins 3.
> Maintains existing reset flow for non-harvested configurations
>
> This fixes hardware initialization issues on devices with harvested
> SDMA instances where the logical instance IDs don't match physical
> hardware mapping.
>

This shouldn't be required. Without harvest-awareness, driver won't load properly on MI308.

Thanks,
Lijo

> Suggested-by: Jonathan Kim <jonathan.kim at amd.com>
> Signed-off-by: Jesse Zhang <Jesse.Zhang at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c      | 3 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h      | 1 +
>  3 files changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index a0e9bf9b2710..4282f60a0cef 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -759,6 +759,7 @@ static void amdgpu_discovery_read_from_harvest_table(struct amdgpu_device *adev,
>                               ~(1U << harvest_info->list[i].number_instance);
>                       break;
>               case SDMA0_HWID:
> +                     adev->sdma.harvest_config |= (1U <<
> +harvest_info->list[i].number_instance);
>                       adev->sdma.sdma_mask &=
>                               ~(1U << harvest_info->list[i].number_instance);
>                       break;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> index 6716ac281c49..0bfd2c138d24 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> @@ -581,6 +581,9 @@ int amdgpu_sdma_reset_engine(struct amdgpu_device *adev, uint32_t instance_id)
>       bool gfx_sched_stopped = false, page_sched_stopped = false;
>
>       mutex_lock(&sdma_instance->engine_reset_mutex);
> +
> +     if (adev->sdma.harvest_config)
> +             instance_id = GET_INST(SDMA0, instance_id);
>       /* Stop the scheduler's work queue for the GFX and page rings if they are running.
>       * This ensures that no new tasks are submitted to the queues while
>       * the reset is in progress.
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
> index e5f8951bbb6f..fed00854a1a2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
> @@ -123,6 +123,7 @@ struct amdgpu_sdma {
>
>       int                     num_instances;
>       uint32_t                sdma_mask;
> +     uint32_t                harvest_config;
>       int                     num_inst_per_aid;
>       uint32_t                    srbm_soft_reset;
>       bool                    has_page_queue;



More information about the amd-gfx mailing list