[PATCH v2 1/2] drm/amdkfd: Restore SDMA queues with engine_id

Yat Sin, David David.YatSin at amd.com
Mon Jul 28 17:55:44 UTC 2025


[AMD Official Use Only - AMD Internal Distribution Only]

Yes, you are correct. I wanted to re-trigger the scanning of available engine_id's on restore here:
https://codebrowser.dev/linux/linux/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c.html#1652

But that is incorrect. On restore, we should only allow the new SDMA queue with the same sdma_id as before. I will drop this patch.

Regards,
~David

> -----Original Message-----
> From: Kuehling, Felix <Felix.Kuehling at amd.com>
> Sent: Monday, July 28, 2025 11:58 AM
> To: Yat Sin, David <David.YatSin at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Bhardwaj, Rajneesh <Rajneesh.Bhardwaj at amd.com>
> Subject: Re: [PATCH v2 1/2] drm/amdkfd: Restore SDMA queues with engine_id
>
> On 2025-07-22 13:47, David Yat Sin wrote:
> > Add support for checkpoint/restore for SDMA queues of type
> > KFD_QUEUE_TYPE_SDMA_BY_ENG_ID.
> >
> > Signed-off-by: David Yat Sin <David.YatSin at amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h                  | 1 +
> >   drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 9 +++++++++
> >   2 files changed, 10 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > index 67694bcd9464..837da09b5bec 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > @@ -1261,6 +1261,7 @@ struct kfd_criu_queue_priv_data {
> >     uint32_t doorbell_id;
> >     uint32_t gws;
> >     uint32_t sdma_id;
> > +   uint32_t sdma_engine_id;
> >     uint32_t eop_ring_buffer_size;
> >     uint32_t ctx_save_restore_area_size;
> >     uint32_t ctl_stack_size;
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > index c643e0ccec52..fe4c48930aad 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> > @@ -846,6 +846,14 @@ static int criu_checkpoint_queue(struct
> > kfd_process_device *pdd,
> >
> >     q_data->sdma_id = q->sdma_id;
> >
> > +   if ((q->properties.type == KFD_QUEUE_TYPE_SDMA ||
> > +            q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
> > +           && q->properties.sdma_engine_id) {
> > +           q_data->type = KFD_QUEUE_TYPE_SDMA_BY_ENG_ID;
> > +   }
> > +
> > +   q_data->sdma_engine_id = q->properties.sdma_engine_id;
>
> Is this really needed? Isn't the sdma engine ID implied by the sdma_id?
> That should be sufficient to ensure that SDMA queues are restored on the same
> engine.
>
> I think we should never see KFD_QUEUE_TYPE_SDMA_BY_ENG_ID when we
> take a CRIU checkpoint because that gets replaced by either
> KFD_QUEUE_TYPE_SDMA or KFD_QUEUE_TYPE_SDMA_XGMI in
> allocate_sdma_queue.
>
> Regards,
>    Felix

>
> > +
> >     q_data->eop_ring_buffer_address =
> >             q->properties.eop_ring_buffer_address;
> >
> > @@ -972,6 +980,7 @@ static void set_queue_properties_from_criu(struct
> queue_properties *qp,
> >     qp->queue_size = q_data->q_size;
> >     qp->read_ptr = (uint32_t *) q_data->read_ptr_addr;
> >     qp->write_ptr = (uint32_t *) q_data->write_ptr_addr;
> > +   qp->sdma_engine_id = q_data->sdma_engine_id;
> >     qp->eop_ring_buffer_address = q_data->eop_ring_buffer_address;
> >     qp->eop_ring_buffer_size = q_data->eop_ring_buffer_size;
> >     qp->ctx_save_restore_area_address =
> > q_data->ctx_save_restore_area_address;


More information about the amd-gfx mailing list