[PATCH v2 2/2] drm/amdkfd: CRIU add support for GWS queues

Paul Menzel pmenzel at molgen.mpg.de
Tue Apr 19 06:54:14 UTC 2022


Dear David,


Thank you for sending out v3 of these patches.

Am 19.04.22 um 02:04 schrieb Yat Sin, David:
> 
> 
>> -----Original Message-----
>> From: Paul Menzel <pmenzel at molgen.mpg.de>
>> Sent: Monday, April 18, 2022 4:23 PM

[…]
>> Am 18.04.22 um 18:44 schrieb David Yat Sin:
>>
>> In the commit message summary, you could reorder some words:
>>
>> Add CRIU support for GWS queues
>>
>>> Adding support to checkpoint/restore GWS(Global Wave Sync) queues.
>>
>> s/Adding/Add/

Did you miss the two comments above?

>> Please add a space before the (.
> ACK
>>
>> How can this be tested?
> We have some internal tests that can we be used to specifically test this feature.

Nice. Are you going to publish these in the future?

>>> Signed-off-by: David Yat Sin <david.yatsin at amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdkfd/kfd_priv.h                  |  2 +-
>>>    drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 10 +++++++---
>>>    2 files changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> index f36062be9ca8..192dbef04c43 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> @@ -1102,7 +1102,7 @@ struct kfd_criu_queue_priv_data {
>>>    	uint32_t priority;
>>>    	uint32_t q_percent;
>>>    	uint32_t doorbell_id;
>>> -	uint32_t is_gws;
>>> +	uint32_t gws;
>>
>> Why is the new name better?
> The old variable (is_gws) was obtained from the queue_properties
> structure during checkpoint and is only used temporarily during queue
> creation, so this variable cannot be used to determine whether a
> queue as gws enabled. The new variable (gws) is obtained from the
> queue structure. The name is changed to better reflect this.

Further down you seem to use it like a boolean though. So a name
reflecting that would be nice.

>>>    	uint32_t sdma_id;
>>>    	uint32_t eop_ring_buffer_size;
>>>    	uint32_t ctx_save_restore_area_size; diff --git
>>> a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>>> index 6eca9509f2e3..4f58e671d39b 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>>> @@ -636,6 +636,8 @@ static int criu_checkpoint_queue(struct
>> kfd_process_device *pdd,
>>>    	q_data->ctx_save_restore_area_size =
>>>    		q->properties.ctx_save_restore_area_size;
>>>
>>> +	q_data->gws = !!q->gws;
>>> +
>>>    	ret = pqm_checkpoint_mqd(&pdd->process->pqm, q-> properties.queue_id, mqd, ctl_stack);
>>>    	if (ret) {
>>>    		pr_err("Failed checkpoint queue_mqd (%d)\n", ret); @@ -743,7
>>> +745,6 @@ static void set_queue_properties_from_criu(struct queue_properties *qp,
>>>    					  struct kfd_criu_queue_priv_data *q_data)
>>>    {
>>>    	qp->is_interop = false;
>>> -	qp->is_gws = q_data->is_gws;
>>>    	qp->queue_percent = q_data->q_percent;
>>>    	qp->priority = q_data->priority;
>>>    	qp->queue_address = q_data->q_address; @@ -826,12 +827,15 @@
>> int kfd_criu_restore_queue(struct kfd_process *p,
>>>    				NULL);
>>>    	if (ret) {
>>>    		pr_err("Failed to create new queue err:%d\n", ret);
>>> -		ret = -EINVAL;
>>> +		goto exit;
>>>    	}
>>>
>>> +	if (q_data->gws)
>>> +		ret = pqm_set_gws(&p->pqm, q_data->q_id, pdd->dev->gws);
>>> +
>>>    exit:
>>>    	if (ret)
>>> -		pr_err("Failed to create queue (%d)\n", ret);
>>> +		pr_err("Failed to restore queue (%d)\n", ret);
>>
>> Maybe separate this out, so it can be applied to stable series.

Did you miss this comment?

>>>    	else
>>>    		pr_debug("Queue id %d was restored successfully\n", queue_id);
>>>


Kind regards,

Paul


More information about the amd-gfx mailing list