[PATCH v2 1/2] drm/amdkfd: Restore SDMA queues with engine_id
Felix Kuehling
felix.kuehling at amd.com
Mon Jul 28 15:58:01 UTC 2025
On 2025-07-22 13:47, David Yat Sin wrote:
> Add support for checkpoint/restore for SDMA queues of type
> KFD_QUEUE_TYPE_SDMA_BY_ENG_ID.
>
> Signed-off-by: David Yat Sin <David.YatSin at amd.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 +
> drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 9 +++++++++
> 2 files changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 67694bcd9464..837da09b5bec 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -1261,6 +1261,7 @@ struct kfd_criu_queue_priv_data {
> uint32_t doorbell_id;
> uint32_t gws;
> uint32_t sdma_id;
> + uint32_t sdma_engine_id;
> uint32_t eop_ring_buffer_size;
> uint32_t ctx_save_restore_area_size;
> uint32_t ctl_stack_size;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index c643e0ccec52..fe4c48930aad 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -846,6 +846,14 @@ static int criu_checkpoint_queue(struct kfd_process_device *pdd,
>
> q_data->sdma_id = q->sdma_id;
>
> + if ((q->properties.type == KFD_QUEUE_TYPE_SDMA ||
> + q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
> + && q->properties.sdma_engine_id) {
> + q_data->type = KFD_QUEUE_TYPE_SDMA_BY_ENG_ID;
> + }
> +
> + q_data->sdma_engine_id = q->properties.sdma_engine_id;
Is this really needed? Isn't the sdma engine ID implied by the sdma_id?
That should be sufficient to ensure that SDMA queues are restored on the
same engine.
I think we should never see KFD_QUEUE_TYPE_SDMA_BY_ENG_ID when we take a
CRIU checkpoint because that gets replaced by either KFD_QUEUE_TYPE_SDMA
or KFD_QUEUE_TYPE_SDMA_XGMI in allocate_sdma_queue.
Regards,
Felix
> +
> q_data->eop_ring_buffer_address =
> q->properties.eop_ring_buffer_address;
>
> @@ -972,6 +980,7 @@ static void set_queue_properties_from_criu(struct queue_properties *qp,
> qp->queue_size = q_data->q_size;
> qp->read_ptr = (uint32_t *) q_data->read_ptr_addr;
> qp->write_ptr = (uint32_t *) q_data->write_ptr_addr;
> + qp->sdma_engine_id = q_data->sdma_engine_id;
> qp->eop_ring_buffer_address = q_data->eop_ring_buffer_address;
> qp->eop_ring_buffer_size = q_data->eop_ring_buffer_size;
> qp->ctx_save_restore_area_address = q_data->ctx_save_restore_area_address;
More information about the amd-gfx
mailing list