[PATCH] drm/amdgpu: Fix KFD oversubscription by tracking queues correctly

Liu, Shaoyun Shaoyun.Liu at amd.com
Thu Jul 13 20:12:21 UTC 2017


There is a function get_mec_num use the field , but seems  no one  call it  , maybe remove it  as well. 

Regards
Shaoyun.liu

-----Original Message-----
From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf Of Andres Rodriguez
Sent: Thursday, July 13, 2017 3:54 PM
To: Kuehling, Felix; Jay Cornwall; amd-gfx at lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Fix KFD oversubscription by tracking queues correctly



On 2017-07-13 03:35 PM, Felix Kuehling wrote:
> On 17-07-13 03:15 PM, Jay Cornwall wrote:
>> On Thu, Jul 13, 2017, at 13:36, Andres Rodriguez wrote:
>>> On 2017-07-12 02:26 PM, Jay Cornwall wrote:
>>>> The number of compute queues available to the KFD was erroneously 
>>>> calculated as 64. Only the first MEC can execute compute queues and 
>>>> it has 32 queue slots.
>>>>
>>>> This caused the oversubscription limit to be calculated 
>>>> incorrectly, leading to a missing chained runlist command at the 
>>>> end of an oversubscribed runlist.
>>>>
>>>> Change-Id: Ic4a139c04b8a6d025fbb831a0a67e98728bfe461
>>>> Signed-off-by: Jay Cornwall <Jay.Cornwall at amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> index 7060daf..aa4006a 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> @@ -140,7 +140,7 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
>>>>    		/* According to linux/bitmap.h we shouldn't use bitmap_clear if
>>>>    		 * nbits is not compile time constant
>>>>    		 */
>>>> -		last_valid_bit = adev->gfx.mec.num_mec
>>>> +		last_valid_bit = 1 /* only first MEC can have compute queues */
>>> Hey Jay,
>>>
>>> Minor nitpick. We already have some similar resource patching in 
>>> kgd2kfd_device_init(), and I think it would be good to keep all of 
>>> these together.
>> OK. I see shared_resources.num_mec is set to 1 in kgd2kfd_device_init.
>> That's not very clear (the number of MECs doesn't change) and num_mec 
>> doesn't appear to be used anywhere except in dead code in kfd_device.c.
>> That code also runs after the queue bitmap setup.
>>
>> How about I remove that field entirely?
> Yeah, that's fine with me.
> 

Good with me as well.
_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


More information about the amd-gfx mailing list