[PATCH 2/4] drm/amdkfd: Signal page table fence after KFD flush tlb

Mon Jun 5 15:47:28 UTC 2023

Am 05.06.23 um 17:40 schrieb Shashank Sharma:
>
> On 05/06/2023 17:18, Christian König wrote:
>> Am 05.06.23 um 17:13 schrieb Shashank Sharma:
>>>
>>> On 02/06/2023 16:54, Felix Kuehling wrote:
>>>> Am 2023-06-02 um 07:57 schrieb Christian König:
>>>>> Am 01.06.23 um 21:31 schrieb Philip Yang:
>>>>>> To free page table BOs which are freed when updating page table, for
>>>>>> example PTE BOs when PDE0 used as PTE.
>>>>>>
>>>>>> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
>>>>>> ---
>>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 +++++
>>>>>>   1 file changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
>>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> index af0a4b5257cc..0ff007a74d03 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> @@ -2101,6 +2101,11 @@ void kfd_flush_tlb(struct 
>>>>>> kfd_process_device *pdd, enum TLB_FLUSH_TYPE type)
>>>>>>               amdgpu_amdkfd_flush_gpu_tlb_pasid(
>>>>>>                   dev->adev, pdd->process->pasid, type, xcc);
>>>>>>       }
>>>>>> +
>>>>>> +    /* Signal page table fence to free page table BOs */
>>>>>> +    dma_fence_signal(vm->pt_fence);
>>>>>
>>>>> That's not something you can do here.
>>>>>
>>>>> Signaling a fence can never depend on anything except for hardware 
>>>>> work. In other words dma_fence_signal() is supposed to be called 
>>>>> only from interrupt context!
>>>>
>>>> We are signaling eviction fences from normal user context, too. 
>>>> There is no practical way to put this into an interrupt handler 
>>>> when the TLB flush is being done synchronously on a user thread. We 
>>>> have to do this in such a context for user mode queues.
>>>
>>>
>>> We are currently working on adding a provide a high level kernel API 
>>> which can be called directly to perform a TLB flush. Internally this 
>>> API will add a deferred work to use the SDMA engine to perform a GPU 
>>> TLB flush work (to compensate for a HW bug in Navi Chips). If my 
>>> understanding is right, by interrupt context Christian means to 
>>> perform this flush and signal from that differed work, is that so 
>>> @Christian ?
>>
>> Well more or less. Ideally you put the TLB flush in a work item (or 
>> use the SDMA for the hw bug workaround on Navi 1x).
>>
>> The point is that you shouldn't have it in the same execution thread 
>> as the VM page table updates, because any memory allocation or 
>> grabbing a lock could potentially depend on the TLB flush as soon as 
>> you have published the dma_fence (by adding it to the VM page table 
>> BOs for example).
>>
> Would it work for everyone if we add this generic API (say 
> amdgpu_flush_tlb_async()) to add a TLB flush work and will also send 
> this dma_fence_signal from within ? KFD can simply consume it from 
> wherever they want, do you see a race condition if we do like this ?

Yes, that's pretty much the whole idea. amdgpu_flush_tlb() should just 
return a dma_fence object.

This dma_fence object should either be the SDMA workaround or signaled 
from a work item.

We can then fence the BOs or just wait for the dma_fence object to signal.

Regards,
Christian.

>
> - Shashank
>
>> Christian.
>>
>>>
>>> - Shashank
>>>
>>>>
>>>> Regards,
>>>>   Felix
>>>>
>>>>
>>>>>
>>>>> What we can to is to put the TLB flushing into an irq worker or 
>>>>> work item and let the signaling happen from there.
>>>>>
>>>>> Amar and Shashank are already working on this, I strongly suggest 
>>>>> to sync up with them.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> + dma_fence_put(vm->pt_fence);
>>>>>> +    vm->pt_fence = amdgpu_pt_fence_create();
>>>>>>   }
>>>>>>     struct kfd_process_device 
>>>>>> *kfd_process_device_data_by_id(struct kfd_process *p, uint32_t 
>>>>>> gpu_id)
>>>>>
>>