[PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus

Eric Huang jinhuieric.huang at amd.com
Tue Jan 18 20:13:50 UTC 2022


I understand Alex's concern. I think usually we only check the version 
when a feature is only available in a specific version, and other 
version or newer version doesn't have.

In case of FW fix, we assume the driver and FWs have to be synchronous. 
If we have driver backward compatibility for FWs, there must be a lot of 
redundant codes for FW version check. So this patch and SDMA fix will be 
pushed into ROCm 5.1 release branch at the same time.

Regards,
Eric

On 2022-01-18 14:35, Alex Deucher wrote:
> On Tue, Jan 18, 2022 at 2:27 PM Russell, Kent <Kent.Russell at amd.com> wrote:
>> [AMD Official Use Only]
>>
>> I think what he means is that if we are using SDMA v17, this will cause issues, won't it? Should we check that SDMA version is >=18 before enabling it? Or am I misunderstanding the fix?
> Yes, that was my concern.
>
> Alex
>
>>   Kent
>>
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Eric Huang
>>> Sent: Tuesday, January 18, 2022 2:09 PM
>>> To: Alex Deucher <alexdeucher at gmail.com>
>>> Cc: amd-gfx list <amd-gfx at lists.freedesktop.org>
>>> Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus
>>>
>>> The SDMA fix is generic and not in a specific version of FW, so we don't
>>> have to check.
>>>
>>> Thanks,
>>> Eric
>>>
>>> On 2022-01-18 11:35, Alex Deucher wrote:
>>>> On Tue, Jan 18, 2022 at 11:16 AM Eric Huang <jinhuieric.huang at amd.com> wrote:
>>>>> SDMA FW fixes the hang issue for adding heavy-weight TLB
>>>>> flush on Arcturus, so we can enable it.
>>>> Do we need to check for a specific fw version?
>>>>
>>>> Alex
>>>>
>>>>> Signed-off-by: Eric Huang <jinhuieric.huang at amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 +++++---
>>>>>    drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 3 ++-
>>>>>    2 files changed, 7 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> index a64cbbd943ba..7b24a920c12e 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> @@ -1892,10 +1892,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>>>>>                                   true);
>>>>>           ret = unreserve_bo_and_vms(&ctx, false, false);
>>>>>
>>>>> -       /* Only apply no TLB flush on Aldebaran to
>>>>> -        * workaround regressions on other Asics.
>>>>> +       /* Only apply no TLB flush on Aldebaran and Arcturus
>>>>> +        * to workaround regressions on other Asics.
>>>>>            */
>>>>> -       if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
>>>>> +       if (table_freed &&
>>>>> +           (adev->asic_type != CHIP_ALDEBARAN) &&
>>>>> +           (adev->asic_type != CHIP_ARCTURUS))
>>>>>                   *table_freed = true;
>>>>>
>>>>>           goto out;
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>>> index b570c0454ce9..ef4d676cc71c 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>>> @@ -1806,7 +1806,8 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file
>>> *filep,
>>>>>           }
>>>>>           mutex_unlock(&p->mutex);
>>>>>
>>>>> -       if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
>>>>> +       if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
>>>>> +           KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1)) {
>>>>>                   err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
>>>>>                                   (struct kgd_mem *) mem, true);
>>>>>                   if (err) {
>>>>> --
>>>>> 2.25.1
>>>>>



More information about the amd-gfx mailing list