[PATCH 1/2] drm/amdgpu: Add SDMA queue start/stop callbacks to amdgpu_ring_funcs
Christian König
christian.koenig at amd.com
Tue Mar 11 12:16:47 UTC 2025
Am 11.03.25 um 09:32 schrieb Jesse.zhang at amd.com:
> From: "Jesse.zhang at amd.com" <Jesse.zhang at amd.com>
>
> This patch introduces two new callbacks, `stop_queue` and `start_queue`, to the
> `amdgpu_ring_funcs` structure. These callbacks are designed to handle the stopping
> and starting of SDMA queues during engine reset operations. The changes include:
>
> 1. **Addition of Callbacks**:
> - Added `stop_queue` and `start_queue` function pointers to `amdgpu_ring_funcs`.
> - These callbacks allow for modular and flexible management of SDMA queues during
> reset operations.
Why does that needs to be per ring callbacks?
Flexibility is usually something bad when not needed.
Regards,
Christian.
>
> 2. **Integration with SDMA v4.4.2**:
> - Implemented `sdma_v4_4_2_stop_queue` and `sdma_v4_4_2_restore_queue` as the
> respective callback functions for SDMA v4.4.2.
> - These functions handle the stopping and starting of SDMA queues, ensuring that
> the scheduler's work queue is properly managed during resets.
>
> 3. **Purpose**:
> - The new callbacks provide a standardized way to stop and start SDMA queues,
> which is essential for handling engine resets gracefully.
> - This change simplifies the reset logic and improves maintainability by
> centralizing queue management in the `amdgpu_ring_funcs` structure.
>
> 4. **Impact**:
> - The addition of these callbacks ensures that SDMA queues are properly stopped
> and started during reset operations, reducing the risk of race conditions and
> improving the reliability of the reset process.
> - This change is a prerequisite for future improvements to the SDMA reset logic,
> including better coordination between the KGD and KFD during resets.
>
> Suggested-by:Jonathan Kim <jonathan.kim at amd.com>
> Signed-off-by: Jesse Zhang <Jesse.Zhang at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++
> drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 2 ++
> 2 files changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index b4fd1e17205e..1c52ff92ea26 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -237,6 +237,8 @@ struct amdgpu_ring_funcs {
> void (*patch_ce)(struct amdgpu_ring *ring, unsigned offset);
> void (*patch_de)(struct amdgpu_ring *ring, unsigned offset);
> int (*reset)(struct amdgpu_ring *ring, unsigned int vmid);
> + int (*stop_queue)(struct amdgpu_device *adev, uint32_t instance_id);
> + int (*start_queue)(struct amdgpu_device *adev, uint32_t instance_id);
> void (*emit_cleaner_shader)(struct amdgpu_ring *ring);
> bool (*is_guilty)(struct amdgpu_ring *ring);
> };
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> index fd34dc138081..c1f7ccff9c4e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> @@ -2132,6 +2132,8 @@ static const struct amdgpu_ring_funcs sdma_v4_4_2_ring_funcs = {
> .emit_reg_wait = sdma_v4_4_2_ring_emit_reg_wait,
> .emit_reg_write_reg_wait = amdgpu_ring_emit_reg_write_reg_wait_helper,
> .reset = sdma_v4_4_2_reset_queue,
> + .stop_queue = sdma_v4_4_2_stop_queue,
> + .start_queue = sdma_v4_4_2_restore_queue,
> .is_guilty = sdma_v4_4_2_ring_is_guilty,
> };
>
More information about the amd-gfx
mailing list