[PATCH] drm/amdkfd: move flushing TLBs from map to unmap
Felix Kuehling
felix.kuehling at amd.com
Wed May 26 21:25:14 UTC 2021
Am 2021-05-26 um 3:21 p.m. schrieb Eric Huang:
>
> On 2021-05-25 3:16 p.m., Felix Kuehling wrote:
>> Similar to a recent fix by Philip Yang 76e08b37d0aa ("drm/amdgpu: flush
>> TLB if valid PDE turns into PTE"), there needs to be a conditional TLB
>> flush after map, if any PDEs were unmapped and turned into PTEs in the
>> process. This is currently returned by amdgpu_vm_bo_update_mapping in
>> the "table_freed" parameter. This needs to be also returned by
>> amdgpu_vm_bo_update and reported back to KFD, so KFD can do the TLB
>> flush after map, if needed.
> I follow up your suggestion to create another patch (attached) and
> test it. It seems it doesn't improve the latency when memory size is
> bigger than huge page (2M), because table_freed parameter will always
> be true when mapping page is huge page size. I think Philip's patch is
> to fix the case of remapping memory from small page to huge page in
> HMM, but it doesn't consider if the memory is remapped and arbitrarily
> flushes TLBs when mapping huge page.
That's unexpected. Turning an invalid PDE into a valid (huge) PTE should
not trigger a TLB flush.
Regards,
Felix
>> kfd_flush_tlb probably needs a new parameter to determine the flush
>> type. The flush after map can be a "legacy" flush (type 0). The flush
>> after unmap must be a "heavy-weight" flush (type 2) to make sure we
>> don't evict cache lines into pages that we no longer own.
>>
>> Finally, in the ticket I thought about possible optimizations using a
>> worker to minimize the impact of TLB flushes on unmap latency. That
>> could be a follow up commit.
> It is a good idea to use worker, but how do we grantee it done before
> memory is remapped? if remapping depends on it, then more latency will
> be introduced in map.
>
> Regards,
> Eric
>> Regards,
>> Felix
>>
>>
>> Am 2021-05-25 um 1:53 p.m. schrieb Eric Huang:
>>> It it to optimize memory allocation latency.
>>>
>>> Signed-off-by: Eric Huang <jinhuieric.huang at amd.com>
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>> index 960913a35ee4..ab73741edb97 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>> @@ -1657,20 +1657,6 @@ static int kfd_ioctl_map_memory_to_gpu(struct
>>> file *filep,
>>> goto sync_memory_failed;
>>> }
>>>
>>> - /* Flush TLBs after waiting for the page table updates to
>>> complete */
>>> - for (i = 0; i < args->n_devices; i++) {
>>> - peer = kfd_device_by_id(devices_arr[i]);
>>> - if (WARN_ON_ONCE(!peer))
>>> - continue;
>>> - peer_pdd = kfd_get_process_device_data(peer, p);
>>> - if (WARN_ON_ONCE(!peer_pdd))
>>> - continue;
>>> - if (!amdgpu_read_lock(peer->ddev, true)) {
>>> - kfd_flush_tlb(peer_pdd);
>>> - amdgpu_read_unlock(peer->ddev);
>>> - }
>>> - }
>>> -
>>> kfree(devices_arr);
>>>
>>> trace_kfd_map_memory_to_gpu_end(p,
>>> @@ -1766,6 +1752,7 @@ static int
>>> kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
>>> amdgpu_read_unlock(peer->ddev);
>>> goto unmap_memory_from_gpu_failed;
>>> }
>>> + kfd_flush_tlb(peer_pdd);
>>> amdgpu_read_unlock(peer->ddev);
>>> args->n_success = i+1;
>>> }
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
More information about the amd-gfx
mailing list