[Patch v2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards
Felix Kuehling
felix.kuehling at amd.com
Thu Feb 8 20:43:10 UTC 2024
On 2024-02-08 15:01, Bhardwaj, Rajneesh wrote:
>
> On 2/8/2024 2:41 PM, Felix Kuehling wrote:
>>
>> On 2024-02-07 23:14, Rajneesh Bhardwaj wrote:
>>> In certain cooperative group dispatch scenarios the default SPI
>>> resource
>>> allocation may cause reduced per-CU workgroup occupancy. Set
>>> COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang
>>> scenarions.
>>>
>>> Suggested-by: Joseph Greathouse <Joseph.Greathouse at amd.com>
>>> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj at amd.com>
>>> ---
>>> * Found a bug in the previous reviewed version
>>> https://lists.freedesktop.org/archives/amd-gfx/2024-February/104101.html
>>>
>>> since the q->is_gws is unset for keeping the count.
>>> * updated pqm_set_gws to pass minfo holding gws state for the active
>>> queues and use that to apply the FORCE_SIMD_DIST_MASK.
>>>
>>> drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 4 ++++
>>> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 +
>>> drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 4 +++-
>>> 3 files changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>>> index 42d881809dc7..0b71db4c96b5 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>>> @@ -303,6 +303,10 @@ static void update_mqd(struct mqd_manager *mm,
>>> void *mqd,
>>> update_cu_mask(mm, mqd, minfo, 0);
>>> set_priority(m, q);
>>> + if (minfo && KFD_GC_VERSION(mm->dev) >= IP_VERSION(9, 4, 2))
>>> + m->compute_resource_limits = minfo->gws ?
>>> + COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK : 0;
>>> +
>>
>> This looks OK because we don't set anything else in
>> m->compute_resource_limits. If that ever changes, we have to be more
>> careful here to not wipe out other fields in that register.
>
>
> Yes, Should I change it to below and send a v3?
>
> m->compute_resource_limits |= minfo->gws ?
> COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK : 0;
I think you need to do
if (minfo->gws)
m->compute_resource_limits |= COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK;
else
m->compute_resource_limits &= ~COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK;
That way you can clear the resource limit when GWS is disable for the queue.
>
>
>>
>>
>>> q->is_active = QUEUE_IS_ACTIVE(*q);
>>> }
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> index 677281c0793e..f4b327a2d4a8 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> @@ -542,6 +542,7 @@ struct mqd_update_info {
>>> } cu_mask;
>>> };
>>> enum mqd_update_flag update_flag;
>>> + bool gws;
>>
>> Instead of adding a new bool, can we add a flag to mqd_update_flag?
>
> Maybe, I initially thought about it but then I chose the bool approach
> since those debug flags are generic KFD non per-Asic flags while this
> bool is per-Asic request so I felt they didn't fit together. On the
> other hand, those flags and this bool are both quirks anyways so maybe
> they can be together. Please let me know your preference.
I'd prefer to used the flags. They are currently used for a GFX11 quirk,
now we can add another flag for a GFX9 quirk.
The GFX11 code currently has an implicit assumption that no other flags
exist. That would need to be fixed:
if (has_wa_flag) {
- uint32_t wa_mask = minfo->update_flag == UPDATE_FLAG_DBG_WA_ENABLE ?
+ uint32_t wa_mask = (minfo->update_flag & UPDATE_FLAG_DBG_WA_ENABLE) ?
0xffff : 0xffffffff;
Regards,
Felix
>
>
>>
>> Looks good to me otherwise.
>>
>> Regards,
>> Felix
>>
>>
>>> };
>>> /**
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>>> index 43eff221eae5..5416a110ced9 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>>> @@ -95,6 +95,7 @@ void kfd_process_dequeue_from_device(struct
>>> kfd_process_device *pdd)
>>> int pqm_set_gws(struct process_queue_manager *pqm, unsigned int qid,
>>> void *gws)
>>> {
>>> + struct mqd_update_info minfo = {0};
>>> struct kfd_node *dev = NULL;
>>> struct process_queue_node *pqn;
>>> struct kfd_process_device *pdd;
>>> @@ -146,9 +147,10 @@ int pqm_set_gws(struct process_queue_manager
>>> *pqm, unsigned int qid,
>>> }
>>> pdd->qpd.num_gws = gws ? dev->adev->gds.gws_size : 0;
>>> + minfo.gws = !!gws;
>>> return
>>> pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm,
>>> - pqn->q, NULL);
>>> + pqn->q, &minfo);
>>> }
>>> void kfd_process_dequeue_from_all_devices(struct kfd_process *p)
More information about the amd-gfx
mailing list