[PATCH] drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode

Felix Kuehling felix.kuehling at amd.com
Thu Jan 30 22:51:08 UTC 2020


On 2020-01-30 17:29, Yong Zhao wrote:
> The sdma_queue_count increment should be done before
> execute_queues_cpsch(), which calls pm_calc_rlib_size() where
> sdma_queue_count is used to calculate whether over_subscription is
> triggered.
>
> With the previous code, when a SDMA queue is created,
> compute_queue_count in pm_calc_rlib_size() is one more than the
> actual compute queue number, because the queue_count has been
> incremented while sdma_queue_count has not. This patch fixes that.
>
> Change-Id: I20353e657efd505353d0dd9f7eb2fab5085e7202
> Signed-off-by: Yong Zhao <Yong.Zhao at amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>

But I took a look at pm_calc_rlib_size. I don't think subtracting 
dqm->sdma_queue_count from dqm->queue_count is not quite correct, 
because sdma_queue_count counts all SDMA queues, while queue_count only 
counts active queues. So an application that creates inactive SDMA 
queues will also create errors here. We probably need to count active 
compute and active SDMA queues separately in DQM to fix this properly.

Regards,
   Felix


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 ++++++----
>   1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 2870553a2ce0..80d22bf702e8 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1237,16 +1237,18 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
>   
>   	list_add(&q->list, &qpd->queues_list);
>   	qpd->queue_count++;
> +
> +	if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
> +		dqm->sdma_queue_count++;
> +	else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
> +		dqm->xgmi_sdma_queue_count++;
> +
>   	if (q->properties.is_active) {
>   		dqm->queue_count++;
>   		retval = execute_queues_cpsch(dqm,
>   				KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
>   	}
>   
> -	if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
> -		dqm->sdma_queue_count++;
> -	else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
> -		dqm->xgmi_sdma_queue_count++;
>   	/*
>   	 * Unconditionally increment this counter, regardless of the queue's
>   	 * type or whether the queue is active.


More information about the amd-gfx mailing list