[Patch v2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

Thu Feb 8 20:43:10 UTC 2024

On 2024-02-08 15:01, Bhardwaj, Rajneesh wrote:
>
> On 2/8/2024 2:41 PM, Felix Kuehling wrote:
>>
>> On 2024-02-07 23:14, Rajneesh Bhardwaj wrote:
>>> In certain cooperative group dispatch scenarios the default SPI 
>>> resource
>>> allocation may cause reduced per-CU workgroup occupancy. Set
>>> COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang
>>> scenarions.
>>>
>>> Suggested-by: Joseph Greathouse <Joseph.Greathouse at amd.com>
>>> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj at amd.com>
>>> ---
>>> * Found a bug in the previous reviewed version
>>> https://lists.freedesktop.org/archives/amd-gfx/2024-February/104101.html 
>>>
>>>    since the q->is_gws is unset for keeping the count.
>>> * updated pqm_set_gws to pass minfo holding gws state for the active
>>>    queues and use that to apply the FORCE_SIMD_DIST_MASK.
>>>
>>>   drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c        | 4 ++++
>>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h                  | 1 +
>>>   drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 4 +++-
>>>   3 files changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>>> index 42d881809dc7..0b71db4c96b5 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>>> @@ -303,6 +303,10 @@ static void update_mqd(struct mqd_manager *mm, 
>>> void *mqd,
>>>           update_cu_mask(mm, mqd, minfo, 0);
>>>       set_priority(m, q);
>>>   +    if (minfo && KFD_GC_VERSION(mm->dev) >= IP_VERSION(9, 4, 2))
>>> +        m->compute_resource_limits = minfo->gws ?
>>> +            COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK : 0;
>>> +
>>
>> This looks OK because we don't set anything else in 
>> m->compute_resource_limits. If that ever changes, we have to be more 
>> careful here to not wipe out other fields in that register.
>
>
> Yes, Should I change it to below and send a v3?
>
>  m->compute_resource_limits |= minfo->gws ?
>             COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK : 0;

I think you need to do

	if (minfo->gws)
		m->compute_resource_limits |= COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK;
	else
		m->compute_resource_limits &= ~COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK;

That way you can clear the resource limit when GWS is disable for the queue.

>
>
>>
>>
>>>       q->is_active = QUEUE_IS_ACTIVE(*q);
>>>   }
>>>   diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> index 677281c0793e..f4b327a2d4a8 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> @@ -542,6 +542,7 @@ struct mqd_update_info {
>>>           } cu_mask;
>>>       };
>>>       enum mqd_update_flag update_flag;
>>> +    bool gws;
>>
>> Instead of adding a new bool, can we add a flag to mqd_update_flag?
>
> Maybe, I initially thought about it but then I chose the bool approach 
> since  those debug flags are generic KFD non per-Asic flags while this 
> bool is per-Asic request so I felt they didn't fit together. On the 
> other hand, those flags and this bool are both quirks anyways so maybe 
> they can be together.   Please let me know your preference.

I'd prefer to used the flags. They are currently used for a GFX11 quirk, 
now we can add another flag for a GFX9 quirk.

The GFX11 code currently has an implicit assumption that no other flags 
exist. That would need to be fixed:

         if (has_wa_flag) {
-               uint32_t wa_mask = minfo->update_flag == UPDATE_FLAG_DBG_WA_ENABLE ?
+               uint32_t wa_mask = (minfo->update_flag & UPDATE_FLAG_DBG_WA_ENABLE) ?
                                                 0xffff : 0xffffffff;

Regards,
   Felix

>
>
>>
>> Looks good to me otherwise.
>>
>> Regards,
>>   Felix
>>
>>
>>>   };
>>>     /**
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>>> index 43eff221eae5..5416a110ced9 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>>> @@ -95,6 +95,7 @@ void kfd_process_dequeue_from_device(struct 
>>> kfd_process_device *pdd)
>>>   int pqm_set_gws(struct process_queue_manager *pqm, unsigned int qid,
>>>               void *gws)
>>>   {
>>> +    struct mqd_update_info minfo = {0};
>>>       struct kfd_node *dev = NULL;
>>>       struct process_queue_node *pqn;
>>>       struct kfd_process_device *pdd;
>>> @@ -146,9 +147,10 @@ int pqm_set_gws(struct process_queue_manager 
>>> *pqm, unsigned int qid,
>>>       }
>>>         pdd->qpd.num_gws = gws ? dev->adev->gds.gws_size : 0;
>>> +    minfo.gws = !!gws;
>>>         return 
>>> pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm,
>>> -                            pqn->q, NULL);
>>> +                            pqn->q, &minfo);
>>>   }
>>>     void kfd_process_dequeue_from_all_devices(struct kfd_process *p)