[PATCH v2 1/2] drm/amdkfd: Restore SDMA queues with engine_id

Felix Kuehling felix.kuehling at amd.com
Mon Jul 28 15:58:01 UTC 2025


On 2025-07-22 13:47, David Yat Sin wrote:
> Add support for checkpoint/restore for SDMA queues of type
> KFD_QUEUE_TYPE_SDMA_BY_ENG_ID.
>
> Signed-off-by: David Yat Sin <David.YatSin at amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h                  | 1 +
>   drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 9 +++++++++
>   2 files changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 67694bcd9464..837da09b5bec 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -1261,6 +1261,7 @@ struct kfd_criu_queue_priv_data {
>   	uint32_t doorbell_id;
>   	uint32_t gws;
>   	uint32_t sdma_id;
> +	uint32_t sdma_engine_id;
>   	uint32_t eop_ring_buffer_size;
>   	uint32_t ctx_save_restore_area_size;
>   	uint32_t ctl_stack_size;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index c643e0ccec52..fe4c48930aad 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -846,6 +846,14 @@ static int criu_checkpoint_queue(struct kfd_process_device *pdd,
>   
>   	q_data->sdma_id = q->sdma_id;
>   
> +	if ((q->properties.type == KFD_QUEUE_TYPE_SDMA ||
> +		 q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
> +		&& q->properties.sdma_engine_id) {
> +		q_data->type = KFD_QUEUE_TYPE_SDMA_BY_ENG_ID;
> +	}
> +
> +	q_data->sdma_engine_id = q->properties.sdma_engine_id;

Is this really needed? Isn't the sdma engine ID implied by the sdma_id? 
That should be sufficient to ensure that SDMA queues are restored on the 
same engine.

I think we should never see KFD_QUEUE_TYPE_SDMA_BY_ENG_ID when we take a 
CRIU checkpoint because that gets replaced by either KFD_QUEUE_TYPE_SDMA 
or KFD_QUEUE_TYPE_SDMA_XGMI in allocate_sdma_queue.

Regards,
   Felix


> +
>   	q_data->eop_ring_buffer_address =
>   		q->properties.eop_ring_buffer_address;
>   
> @@ -972,6 +980,7 @@ static void set_queue_properties_from_criu(struct queue_properties *qp,
>   	qp->queue_size = q_data->q_size;
>   	qp->read_ptr = (uint32_t *) q_data->read_ptr_addr;
>   	qp->write_ptr = (uint32_t *) q_data->write_ptr_addr;
> +	qp->sdma_engine_id = q_data->sdma_engine_id;
>   	qp->eop_ring_buffer_address = q_data->eop_ring_buffer_address;
>   	qp->eop_ring_buffer_size = q_data->eop_ring_buffer_size;
>   	qp->ctx_save_restore_area_address = q_data->ctx_save_restore_area_address;


More information about the amd-gfx mailing list