[PATCH 3/3] drm/amdgpu: enable only one compute queue for raven

Nirmoy nirmodas at amd.com
Tue Nov 10 15:14:33 UTC 2020


On 11/9/20 7:57 PM, Alex Deucher wrote:
> On Mon, Nov 9, 2020 at 1:12 PM Nirmoy Das <nirmoy.das at amd.com> wrote:
>> Because of firmware bug, Raven asics can't handle jobs
>> scheduled to multiple compute queues. So enable only one
>> compute queue till we have a firmware fix.
>>
>> Signed-off-by: Nirmoy Das <nirmoy.das at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> index 97a8f786cf85..9352fcb77fe9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> @@ -812,6 +812,13 @@ void amdgpu_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v)
>>   int amdgpu_gfx_get_num_kcq(struct amdgpu_device *adev)
>>   {
>>          if (amdgpu_num_kcq == -1) {
>> +               /* raven firmware currently can not load balance jobs
>> +                * among multiple compute queues. Enable only one
>> +                * compute queue till we have a firmware fix.
>> +                */
>> +               if (adev->asic_type == CHIP_RAVEN)
>> +                       return 1;
>> +


Hi Alex,


> I think this is fine as a workaround for now, but it would be worth
> checking is the issues are only between queues on the same pipe or
> pipes on an MEC.  E.g., can we safely enable one queue per MEC?  What
> about one queue per pipe?


Guchun/Aaron's test machine with a recent VBIOS(113-PICASSO-117) seems to

pass amdgpu_test with one compute queue.


I can reproduce the compute queue hang even with one queue.

With all queue enabled, the issue seems to appear much faster.

So I think those above cases won't change anything with my test

machine which is running older VBIOS(113-PICASSO-115).


I will try to find a test machine with latest VBIOS to test your 
suggestions.


Regards,

Nirmoy

>
> Alex
>
>
>>                  return 8;
>>          } else if (amdgpu_num_kcq > 8 || amdgpu_num_kcq < 0) {
>>                  dev_warn(adev->dev, "set kernel compute queue number to 8 due to invalid parameter provided by user\n");
>> --
>> 2.29.0
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cnirmoy.das%40amd.com%7C5fee9c8359df4f41653508d884e162b3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637405450853281240%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=EKGmSryJhXMhWpo2XeT%2FTThcuv99%2BPAZ8MV%2Ff6sgmfo%3D&reserved=0


More information about the amd-gfx mailing list