[PATCH 1/2] drm/amdkfd: introduce dummy cache info for property asic

Liang, Prike Prike.Liang at amd.com
Mon Oct 24 13:11:20 UTC 2022


[Public]

-----Original Message-----
From: Kuehling, Felix <Felix.Kuehling at amd.com>
Sent: Saturday, October 22, 2022 4:53 AM
To: Liang, Prike <Prike.Liang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Zhang, Yifan <Yifan1.Zhang at amd.com>; Huang, Ray <Ray.Huang at amd.com>; Liu, Aaron <Aaron.Liu at amd.com>
Subject: Re: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for property asic

On 2022-10-21 09:05, Liang, Prike wrote:
> [Public]
>
> -----Original Message-----
> From: Kuehling, Felix <Felix.Kuehling at amd.com>
> Sent: Friday, October 21, 2022 1:11 PM
> To: Liang, Prike <Prike.Liang at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Zhang, Yifan
> <Yifan1.Zhang at amd.com>; Huang, Ray <Ray.Huang at amd.com>; Liu, Aaron
> <Aaron.Liu at amd.com>
> Subject: Re: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for
> property asic
>
> Am 2022-10-20 um 21:50 schrieb Liang, Prike:
>> [Public]
>>
>> -----Original Message-----
>> From: Kuehling, Felix <Felix.Kuehling at amd.com>
>> Sent: Friday, October 21, 2022 12:03 AM
>> To: Liang, Prike <Prike.Liang at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Zhang, Yifan
>> <Yifan1.Zhang at amd.com>; Huang, Ray <Ray.Huang at amd.com>; Liu, Aaron
>> <Aaron.Liu at amd.com>
>> Subject: Re: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for
>> property asic
>>
>>
>> Am 2022-10-20 um 05:15 schrieb Prike Liang:
>>> This dummy cache info will enable kfd base function support.
>>>
>>> Signed-off-by: Prike Liang <Prike.Liang at amd.com>
>>> ---
>>>     drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 55 +++++++++++++++++++++++++--
>>>     1 file changed, 52 insertions(+), 3 deletions(-)
>>>
[snip]
>>> @@ -1528,7 +1574,10 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>>>                                 kfd_fill_gpu_cache_info_from_gfx_config(kdev, pcache_info);
>>>                         break;
>>>                 default:
>>> -                     return -EINVAL;
>>> +                     pcache_info = dummy_cache_info;
>>> +                     num_of_cache_types = ARRAY_SIZE(dummy_cache_info);
>>> +                     pr_warn("dummy cache info is used temporarily and real cache info need update later.\n");
>>> +                     break;
>> Could we make this return an error if the amdgpu.exp_hw_support module parameter is not set?
>>
>> Regards,
>>      Felix
>>
>> [Prike] It's fine to protect this dummy info by checking the parameter amdgpu_exp_hw_support, but it may not friendly to end user by adding the parameter and some guys will still report KFD not enabled for this parameter setting problem. The original idea is the end user will not aware the dummy cache info and only alert the warning message to developer.
> I thought the intention was to simplify bring-up but make sure that valid cache info is available by the time a chip goes into production.
> Therefore, normal end users should never need to set the amdgpu_exp_hw_support option. I think you're saying that we would go to production with dummy info. That seems like a bad idea to me.
>
> Regards,
>     Felix
>
> [Prike]  Sorry for the confusion. In fact, this dummy cache info only used before production and meanwhile needn't require any parameter setting for CQE do KFD test. Anyway if you still have concern on this solution I will add the condition of checking amdgpu_exp_hw_support.

The idea to control this with a module parameter was to cause a more obvious failure if we don't have correct cache info before going to production. Just a warning in the log file is too easy to miss or ignore. Of course, if QA gets into the habit of testing with amdgpu_exp_hw_support, then this may not solve the problem. You need to have at least one test pass without amdgpu_exp_hw_support to catch missing cache info.

Regards,
   Felix

Get your point. As to the KFD support on a NPI will be tracked by a ticket which make sure the real cache info update later after this dummy cache info assigned in the early BU phase.

Thanks,
Prike


More information about the amd-gfx mailing list