[PATCH] drm/amdgpu: Raise dma resv usage for created TLB fence

Christian König christian.koenig at amd.com
Fri Sep 6 14:21:45 UTC 2024


Well that's the whole reason I'm asking :)

Why do you think it should be added as dependency in 
amdgpu_vm_sdma_update? As far as I can see that is complete nonsense.

Page table updates never depend on TLB flushes, it's the TLB flush which 
depends on the page table update.

Regards,
Christian.

Am 06.09.24 um 16:10 schrieb Andjelkovic, Dejan:
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
>
> I might have worded that poorly, I meant that it seems like TLB flush 
> is out of sync with the SDMA update, which leads to a page fault 
> reliably. I don't feel it has anything to do with the implicit sync in 
> itself. When TLB fence is created it's added to the dma_resv of the 
> vm's root buffer object with BOOKKEEP usage specified, in order to 
> make sure no PD/PT is freed before the flush. But I don't think it's 
> being added as a job dependency within the amdgpu_vm_sdma_update, 
> we're adding all the fences found within the dma_resv object with 
> KERNEL usage specified. I may be missing something so I'd love to hear 
> what you think.
>
> Best regards,
> Dejan
> ------------------------------------------------------------------------
> *From:* Koenig, Christian <Christian.Koenig at amd.com>
> *Sent:* Thursday, September 5, 2024 2:40 PM
> *To:* Andjelkovic, Dejan <Dejan.Andjelkovic at amd.com>; 
> amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
> *Cc:* Prica, Nikola <Nikola.Prica at amd.com>; Kuehling, Felix 
> <Felix.Kuehling at amd.com>; Deng, Emily <Emily.Deng at amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: Raise dma resv usage for created 
> TLB fence
> Well that explanation doesn't seem to make much sense either.
>
> What do you mean with TLB flush is occurring prematurely?
>
> Regards,
> Christian.
>
> Am 05.09.24 um 14:38 schrieb Andjelkovic, Dejan:
>>
>> [AMD Official Use Only - AMD Internal Distribution Only]
>>
>>
>> Hi there. We're running into a page fault issue that's very easily 
>> reproducible on a SRIOV environment when using SDMA for page table 
>> updates. Going through mapping logs and trace files, it seems TLB 
>> flush is occurring prematurely. Changing the usage to KERNEL 
>> completely stops the page fault from occurring with no performance 
>> impact, which was confirmed with extensive testing. Looking through 
>> amdgpu_vm_sdma.c, namely within amdgpu_vm_sdma_update when waiting 
>> for PD/PT moves to be completed, fences are iterated with KERNEL 
>> usage specified which are then added as a job dependency.
>>
>> Best regards,
>> Dejan
>> ------------------------------------------------------------------------
>> *From:* Koenig, Christian <Christian.Koenig at amd.com> 
>> <mailto:Christian.Koenig at amd.com>
>> *Sent:* Thursday, September 5, 2024 1:17 PM
>> *To:* Andjelkovic, Dejan <Dejan.Andjelkovic at amd.com> 
>> <mailto:Dejan.Andjelkovic at amd.com>; amd-gfx at lists.freedesktop.org 
>> <mailto:amd-gfx at lists.freedesktop.org> 
>> <amd-gfx at lists.freedesktop.org> <mailto:amd-gfx at lists.freedesktop.org>
>> *Cc:* Prica, Nikola <Nikola.Prica at amd.com> 
>> <mailto:Nikola.Prica at amd.com>; Kuehling, Felix 
>> <Felix.Kuehling at amd.com> <mailto:Felix.Kuehling at amd.com>; Deng, Emily 
>> <Emily.Deng at amd.com> <mailto:Emily.Deng at amd.com>
>> *Subject:* Re: [PATCH] drm/amdgpu: Raise dma resv usage for created 
>> TLB fence
>> Am 05.09.24 um 10:58 schrieb Dejan Andjelkovic:
>> > When using SDMA for PT updates, a TLB fence hooked to a buffer
>> > objects dma resv object with usage declared as BOOKKEEP leaves a
>> > chance for TLB flush to occur prematurely. This will lead to a page
>> > fault. Raising the usage from BOOKKEEP to KERNEL removes this
>> > possibility.
>>
>> Well that's complete nonsense. The usage model is for implicit syncing
>> and not even remotely related to TLB flushing.
>>
>> What exactly is the problem you run into?
>>
>> Regards,
>> Christian.
>>
>> >
>> > Signed-off-by: Dejan Andjelkovic <Dejan.Andjelkovic at amd.com> 
>> <mailto:Dejan.Andjelkovic at amd.com>
>> > ---
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>> >   1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> > index f93804902fd3..8efc2cf9bbb0 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> > @@ -928,7 +928,7 @@ amdgpu_vm_tlb_flush(struct 
>> amdgpu_vm_update_params *params,
>> >
>> >                /* Makes sure no PD/PT is freed before the flush */
>> > dma_resv_add_fence(vm->root.bo->tbo.base.resv, *fence,
>> > - DMA_RESV_USAGE_BOOKKEEP);
>> > + DMA_RESV_USAGE_KERNEL);
>> >        }
>> >   }
>> >
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20240906/f35d87f6/attachment-0001.htm>


More information about the amd-gfx mailing list