[PATCH] drm/amdgpu: Fix KFD oversubscription by tracking queues correctly

Andres Rodriguez andresx7 at gmail.com
Thu Jul 13 18:48:30 UTC 2017



On 2017-07-13 02:36 PM, Andres Rodriguez wrote:
> On 2017-07-12 02:26 PM, Jay Cornwall wrote:
>> The number of compute queues available to the KFD was erroneously
>> calculated as 64. Only the first MEC can execute compute queues and
>> it has 32 queue slots.
>>
>> This caused the oversubscription limit to be calculated incorrectly,
>> leading to a missing chained runlist command at the end of an
>> oversubscribed runlist.
>>
>> Change-Id: Ic4a139c04b8a6d025fbb831a0a67e98728bfe461
>> Signed-off-by: Jay Cornwall <Jay.Cornwall at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> index 7060daf..aa4006a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> @@ -140,7 +140,7 @@ void amdgpu_amdkfd_device_init(struct 
>> amdgpu_device *adev)
>>           /* According to linux/bitmap.h we shouldn't use bitmap_clear if
>>            * nbits is not compile time constant
>>            */
>> -        last_valid_bit = adev->gfx.mec.num_mec
>> +        last_valid_bit = 1 /* only first MEC can have compute queues */
> 
> Hey Jay,
> 
> Minor nitpick. We already have some similar resource patching in 
> kgd2kfd_device_init(), and I think it would be good to keep all of these 
> together.
> 
> Otherwise, looks good to me.

Just re-read my reply and wanted to clarify. I don't really have a 
strong opining on which side does the resource availability patched. 
Whether it happens here or on the KFD side it is fine.

I just don't think it is good to keep it in different two places.

Regards,
Andres

> 
> Regards,
> Andres
> 
>>                   * adev->gfx.mec.num_pipe_per_mec
>>                   * adev->gfx.mec.num_queue_per_pipe;
>>           for (i = last_valid_bit; i < KGD_MAX_QUEUES; ++i)
>>


More information about the amd-gfx mailing list