[PATCH] drm/amdkfd: move flushing TLBs from map to unmap

Fri May 28 15:23:05 UTC 2021

Am 27.05.21 um 16:05 schrieb philip yang:
>
>
> On 2021-05-26 5:25 p.m., Felix Kuehling wrote:
>> Am 2021-05-26 um 3:21 p.m. schrieb Eric Huang:
>>> On 2021-05-25 3:16 p.m., Felix Kuehling wrote:
>>>> Similar to a recent fix by Philip Yang 76e08b37d0aa ("drm/amdgpu: flush
>>>> TLB if valid PDE turns into PTE"), there needs to be a conditional TLB
>>>> flush after map, if any PDEs were unmapped and turned into PTEs in the
>>>> process. This is currently returned by amdgpu_vm_bo_update_mapping in
>>>> the "table_freed" parameter. This needs to be also returned by
>>>> amdgpu_vm_bo_update and reported back to KFD, so KFD can do the TLB
>>>> flush after map, if needed.
>>> I follow up your suggestion to create another patch (attached) and
>>> test it. It seems it doesn't improve the latency when memory size is
>>> bigger than huge page (2M), because table_freed parameter will always
>>> be true when mapping page is huge page size. I think Philip's patch is
>>> to fix the case of remapping memory from small page to huge page in
>>> HMM, but it doesn't consider if the memory is remapped and arbitrarily
>>> flushes TLBs when mapping huge page.
>> That's unexpected. Turning an invalid PDE into a valid (huge) PTE should
>> not trigger a TLB flush.
>
> table_freed will be true if PDE has been used by previous mapping, 
> unmap the previous mapping will clear the PTEs, leave PDE unchanged as 
> P=0, V=1 (in memory and TLB), then huge page mapping turns PDE to PTE 
> (P=1, V=1) in memory, and free PTE page.
>

I think there might be a little bug in your patch. See we set 
params.table_freed to true when we call amdgpu_vm_free_pts(), but 
amdgpu_vm_free_pts() doesn't necessary frees anything.

It can be that all subsequent page tables where never allocated before.

Christian.

> For example, test map 0x7ffe37401000, unmap it, and then map 
> 0x7ffe3740000 2MB huge page, table_freed will be true, means that 
> flush TLB is needed after mapping huge page.
>
> You can change the test, don't unmap previous mapping, then 2MB huge 
> page will get new GPU virtual address, or closeKFD, openKFD again to 
> create new GPU vm.
>
> Regards,
>
> Philip
>
>> Regards,
>>    Felix
>>
>>
>>>> kfd_flush_tlb probably needs a new parameter to determine the flush
>>>> type. The flush after map can be a "legacy" flush (type 0). The flush
>>>> after unmap must be a "heavy-weight" flush (type 2) to make sure we
>>>> don't evict cache lines into pages that we no longer own.
>>>>
>>>> Finally, in the ticket I thought about possible optimizations using a
>>>> worker to minimize the impact of TLB flushes on unmap latency. That
>>>> could be a follow up commit.
>>> It is a good idea to use worker, but how do we grantee it done before
>>> memory is remapped? if remapping depends on it, then more latency will
>>> be introduced in map.
>>>
>>> Regards,
>>> Eric
>>>> Regards,
>>>>     Felix
>>>>
>>>>
>>>> Am 2021-05-25 um 1:53 p.m. schrieb Eric Huang:
>>>>> It it to optimize memory allocation latency.
>>>>>
>>>>> Signed-off-by: Eric Huang<jinhuieric.huang at amd.com>
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>>> index 960913a35ee4..ab73741edb97 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>>>> @@ -1657,20 +1657,6 @@ static int kfd_ioctl_map_memory_to_gpu(struct
>>>>> file *filep,
>>>>>                   goto sync_memory_failed;
>>>>>           }
>>>>>
>>>>> -       /* Flush TLBs after waiting for the page table updates to
>>>>> complete */
>>>>> -       for (i = 0; i < args->n_devices; i++) {
>>>>> -               peer = kfd_device_by_id(devices_arr[i]);
>>>>> -               if (WARN_ON_ONCE(!peer))
>>>>> -                       continue;
>>>>> -               peer_pdd = kfd_get_process_device_data(peer, p);
>>>>> -               if (WARN_ON_ONCE(!peer_pdd))
>>>>> -                       continue;
>>>>> -               if (!amdgpu_read_lock(peer->ddev, true)) {
>>>>> -                       kfd_flush_tlb(peer_pdd);
>>>>> -                       amdgpu_read_unlock(peer->ddev);
>>>>> -               }
>>>>> -       }
>>>>> -
>>>>>           kfree(devices_arr);
>>>>>
>>>>>           trace_kfd_map_memory_to_gpu_end(p,
>>>>> @@ -1766,6 +1752,7 @@ static int
>>>>> kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
>>>>>                           amdgpu_read_unlock(peer->ddev);
>>>>>                           goto unmap_memory_from_gpu_failed;
>>>>>                   }
>>>>> +               kfd_flush_tlb(peer_pdd);
>>>>>                   amdgpu_read_unlock(peer->ddev);
>>>>>                   args->n_success = i+1;
>>>>>           }
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx at lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cphilip.yang%40amd.com%7C92ac3fbce9264fbcf40508d9208cc477%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637576611241705305%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=S8NSZRdXq%2B74tSSLkm2TYEVDr%2Fr%2BW%2FET7CJln7tbEQo%3D&reserved=0
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cphilip.yang%40amd.com%7C92ac3fbce9264fbcf40508d9208cc477%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637576611241705305%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=S8NSZRdXq%2B74tSSLkm2TYEVDr%2Fr%2BW%2FET7CJln7tbEQo%3D&reserved=0
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20210528/98082026/attachment-0001.htm>