[PATCH] drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode

Yong Zhao yong.zhao at amd.com
Thu Jan 30 23:07:10 UTC 2020


True. It is a bug too. I am looking into it.

Yong

On 2020-01-30 5:51 p.m., Felix Kuehling wrote:
> On 2020-01-30 17:29, Yong Zhao wrote:
>> The sdma_queue_count increment should be done before
>> execute_queues_cpsch(), which calls pm_calc_rlib_size() where
>> sdma_queue_count is used to calculate whether over_subscription is
>> triggered.
>>
>> With the previous code, when a SDMA queue is created,
>> compute_queue_count in pm_calc_rlib_size() is one more than the
>> actual compute queue number, because the queue_count has been
>> incremented while sdma_queue_count has not. This patch fixes that.
>>
>> Change-Id: I20353e657efd505353d0dd9f7eb2fab5085e7202
>> Signed-off-by: Yong Zhao <Yong.Zhao at amd.com>
>
> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
>
> But I took a look at pm_calc_rlib_size. I don't think subtracting 
> dqm->sdma_queue_count from dqm->queue_count is not quite correct, 
> because sdma_queue_count counts all SDMA queues, while queue_count 
> only counts active queues. So an application that creates inactive 
> SDMA queues will also create errors here. We probably need to count 
> active compute and active SDMA queues separately in DQM to fix this 
> properly.
>
> Regards,
>   Felix
>
>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 ++++++----
>>   1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> index 2870553a2ce0..80d22bf702e8 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> @@ -1237,16 +1237,18 @@ static int create_queue_cpsch(struct 
>> device_queue_manager *dqm, struct queue *q,
>>         list_add(&q->list, &qpd->queues_list);
>>       qpd->queue_count++;
>> +
>> +    if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
>> +        dqm->sdma_queue_count++;
>> +    else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
>> +        dqm->xgmi_sdma_queue_count++;
>> +
>>       if (q->properties.is_active) {
>>           dqm->queue_count++;
>>           retval = execute_queues_cpsch(dqm,
>>                   KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
>>       }
>>   -    if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
>> -        dqm->sdma_queue_count++;
>> -    else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
>> -        dqm->xgmi_sdma_queue_count++;
>>       /*
>>        * Unconditionally increment this counter, regardless of the 
>> queue's
>>        * type or whether the queue is active.


More information about the amd-gfx mailing list