[Patch v2] drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

Thu Feb 8 20:01:55 UTC 2024

On 2/8/2024 2:41 PM, Felix Kuehling wrote:
>
> On 2024-02-07 23:14, Rajneesh Bhardwaj wrote:
>> In certain cooperative group dispatch scenarios the default SPI resource
>> allocation may cause reduced per-CU workgroup occupancy. Set
>> COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang
>> scenarions.
>>
>> Suggested-by: Joseph Greathouse <Joseph.Greathouse at amd.com>
>> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj at amd.com>
>> ---
>> * Found a bug in the previous reviewed version
>> https://lists.freedesktop.org/archives/amd-gfx/2024-February/104101.html
>>    since the q->is_gws is unset for keeping the count.
>> * updated pqm_set_gws to pass minfo holding gws state for the active
>>    queues and use that to apply the FORCE_SIMD_DIST_MASK.
>>
>>   drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c        | 4 ++++
>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h                  | 1 +
>>   drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 4 +++-
>>   3 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>> index 42d881809dc7..0b71db4c96b5 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>> @@ -303,6 +303,10 @@ static void update_mqd(struct mqd_manager *mm, 
>> void *mqd,
>>           update_cu_mask(mm, mqd, minfo, 0);
>>       set_priority(m, q);
>>   +    if (minfo && KFD_GC_VERSION(mm->dev) >= IP_VERSION(9, 4, 2))
>> +        m->compute_resource_limits = minfo->gws ?
>> +            COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK : 0;
>> +
>
> This looks OK because we don't set anything else in 
> m->compute_resource_limits. If that ever changes, we have to be more 
> careful here to not wipe out other fields in that register.

Yes, Should I change it to below and send a v3?

  m->compute_resource_limits |= minfo->gws ?
             COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK : 0;

>
>
>>       q->is_active = QUEUE_IS_ACTIVE(*q);
>>   }
>>   diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> index 677281c0793e..f4b327a2d4a8 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> @@ -542,6 +542,7 @@ struct mqd_update_info {
>>           } cu_mask;
>>       };
>>       enum mqd_update_flag update_flag;
>> +    bool gws;
>
> Instead of adding a new bool, can we add a flag to mqd_update_flag?

Maybe, I initially thought about it but then I chose the bool approach 
since  those debug flags are generic KFD non per-Asic flags while this 
bool is per-Asic request so I felt they didn't fit together. On the 
other hand, those flags and this bool are both quirks anyways so maybe 
they can be together.   Please let me know your preference.

>
> Looks good to me otherwise.
>
> Regards,
>   Felix
>
>
>>   };
>>     /**
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> index 43eff221eae5..5416a110ced9 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> @@ -95,6 +95,7 @@ void kfd_process_dequeue_from_device(struct 
>> kfd_process_device *pdd)
>>   int pqm_set_gws(struct process_queue_manager *pqm, unsigned int qid,
>>               void *gws)
>>   {
>> +    struct mqd_update_info minfo = {0};
>>       struct kfd_node *dev = NULL;
>>       struct process_queue_node *pqn;
>>       struct kfd_process_device *pdd;
>> @@ -146,9 +147,10 @@ int pqm_set_gws(struct process_queue_manager 
>> *pqm, unsigned int qid,
>>       }
>>         pdd->qpd.num_gws = gws ? dev->adev->gds.gws_size : 0;
>> +    minfo.gws = !!gws;
>>         return pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm,
>> -                            pqn->q, NULL);
>> +                            pqn->q, &minfo);
>>   }
>>     void kfd_process_dequeue_from_all_devices(struct kfd_process *p)