[PATCH v2 2/2] drm/amdkfd: CRIU add support for GWS queues

Yat Sin, David David.YatSin at amd.com
Tue Apr 19 12:24:14 UTC 2022



> -----Original Message-----
> From: Paul Menzel <pmenzel at molgen.mpg.de>
> Sent: Tuesday, April 19, 2022 2:54 AM
> To: Yat Sin, David <David.YatSin at amd.com>
> Cc: Kuehling, Felix <Felix.Kuehling at amd.com>; amd-
> gfx at lists.freedesktop.org
> Subject: Re: [PATCH v2 2/2] drm/amdkfd: CRIU add support for GWS queues
> 
> 
> Dear David,
> 
> 
> Thank you for sending out v3 of these patches.
> 
> Am 19.04.22 um 02:04 schrieb Yat Sin, David:
> >
> >
> >> -----Original Message-----
> >> From: Paul Menzel <pmenzel at molgen.mpg.de>
> >> Sent: Monday, April 18, 2022 4:23 PM
> 
> […]
> >> Am 18.04.22 um 18:44 schrieb David Yat Sin:
> >>
> >> In the commit message summary, you could reorder some words:
> >>
> >> Add CRIU support for GWS queues
> >>
> >>> Adding support to checkpoint/restore GWS(Global Wave Sync) queues.
> >>
> >> s/Adding/Add/
> 
> Did you miss the two comments above?
ACK
> 
> >> Please add a space before the (.
> > ACK
> >>
> >> How can this be tested?
> > We have some internal tests that can we be used to specifically test this
> feature.
> 
> Nice. Are you going to publish these in the future?
I think some of these tests depend on other frameworks, so it might not be straight forward to do this.
> 
> >>> Signed-off-by: David Yat Sin <david.yatsin at amd.com>
> >>> ---
> >>>    drivers/gpu/drm/amd/amdkfd/kfd_priv.h                  |  2 +-
> >>>    drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 10
> +++++++---
> >>>    2 files changed, 8 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>> index f36062be9ca8..192dbef04c43 100644
> >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> >>> @@ -1102,7 +1102,7 @@ struct kfd_criu_queue_priv_data {
> >>>    	uint32_t priority;
> >>>    	uint32_t q_percent;
> >>>    	uint32_t doorbell_id;
> >>> -	uint32_t is_gws;
> >>> +	uint32_t gws;
> >>
> >> Why is the new name better?
> > The old variable (is_gws) was obtained from the queue_properties
> > structure during checkpoint and is only used temporarily during queue
> > creation, so this variable cannot be used to determine whether a queue
> > as gws enabled. The new variable (gws) is obtained from the queue
> > structure. The name is changed to better reflect this.
> 
> Further down you seem to use it like a boolean though. So a name reflecting
> that would be nice.
To me this is ok. I would rather have the variable name match its source.
> 
> >>>    	uint32_t sdma_id;
> >>>    	uint32_t eop_ring_buffer_size;
> >>>    	uint32_t ctx_save_restore_area_size; diff --git
> >>> a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> >>> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> >>> index 6eca9509f2e3..4f58e671d39b 100644
> >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> >>> @@ -636,6 +636,8 @@ static int criu_checkpoint_queue(struct
> >> kfd_process_device *pdd,
> >>>    	q_data->ctx_save_restore_area_size =
> >>>    		q->properties.ctx_save_restore_area_size;
> >>>
> >>> +	q_data->gws = !!q->gws;
> >>> +
> >>>    	ret = pqm_checkpoint_mqd(&pdd->process->pqm, q->
> properties.queue_id, mqd, ctl_stack);
> >>>    	if (ret) {
> >>>    		pr_err("Failed checkpoint queue_mqd (%d)\n", ret); @@ -
> 743,7
> >>> +745,6 @@ static void set_queue_properties_from_criu(struct
> >>> +queue_properties *qp,
> >>>    					  struct kfd_criu_queue_priv_data
> *q_data)
> >>>    {
> >>>    	qp->is_interop = false;
> >>> -	qp->is_gws = q_data->is_gws;
> >>>    	qp->queue_percent = q_data->q_percent;
> >>>    	qp->priority = q_data->priority;
> >>>    	qp->queue_address = q_data->q_address; @@ -826,12 +827,15 @@
> >> int kfd_criu_restore_queue(struct kfd_process *p,
> >>>    				NULL);
> >>>    	if (ret) {
> >>>    		pr_err("Failed to create new queue err:%d\n", ret);
> >>> -		ret = -EINVAL;
> >>> +		goto exit;
> >>>    	}
> >>>
> >>> +	if (q_data->gws)
> >>> +		ret = pqm_set_gws(&p->pqm, q_data->q_id, pdd->dev->gws);
> >>> +
> >>>    exit:
> >>>    	if (ret)
> >>> -		pr_err("Failed to create queue (%d)\n", ret);
> >>> +		pr_err("Failed to restore queue (%d)\n", ret);
> >>
> >> Maybe separate this out, so it can be applied to stable series.
> 
> Did you miss this comment?
What do you mean by stable series?

> 
> >>>    	else
> >>>    		pr_debug("Queue id %d was restored successfully\n",
> queue_id);
> >>>
> 
> 
> Kind regards,
> 
> Paul


More information about the amd-gfx mailing list