[PATCH] Revert "drm/amdgpu: promote the implicit sync to the dependent read fences"
Liang, Prike
Prike.Liang at amd.com
Wed May 28 02:09:27 UTC 2025
[Public]
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Christian
> König
> Sent: Tuesday, May 27, 2025 8:24 PM
> To: Khatri, Sunil <Sunil.Khatri at amd.com>; amd-gfx at lists.freedesktop.org; Deucher,
> Alexander <Alexander.Deucher at amd.com>
> Subject: Re: [PATCH] Revert "drm/amdgpu: promote the implicit sync to the
> dependent read fences"
>
> On 5/27/25 13:27, Khatri, Sunil wrote:
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> > -----Original Message-----
> > From: Koenig, Christian <Christian.Koenig at amd.com>
> > Sent: Tuesday, May 27, 2025 2:32 PM
> > To: Khatri, Sunil <Sunil.Khatri at amd.com>;
> > amd-gfx at lists.freedesktop.org; Deucher, Alexander
> > <Alexander.Deucher at amd.com>
> > Subject: Re: [PATCH] Revert "drm/amdgpu: promote the implicit sync to the
> dependent read fences"
> >
> > On 5/27/25 10:58, Sunil Khatri wrote:
> >> This reverts commit 714bbbf20a7266e48632fab466563e695af9acb5.
> >> bisected to this change which is causing the flikering issue in the
> >> UI for various apps like glxgears and unigen heaven.
> >
> > Is that flickering also there when using kernel queues?
> >
> > If not then without an explanation where that flickering is coming from for user
> queues I have to reject that.
> >
> > There is more to just flickering here. We are planning to use
> amdgpu.user_queue=1 in our CI testing. That means both kernel and user
> submission are enabled. When I ran the testing amdgpu_basic tests which runs both
> kernel and user submissions, all the user queue tests are consistently failing. With
> this revert change in place all those issues are fixed, Both user and kernel queue
> tests pass consistently without any artifacts.
>
> That sounds like we are missing some dependency and that works only by
> coincident.
>
> > The reason of why this is helping is I am not sure of but it’s the observation which
> was observed earlier by Arvind too. We could investigate the probable reason of it
> infact Arvind is checking that but to enable CI we could revert this if you agree.
>
> I think figuring out what is missing here is more important than enabling CI.
>
> We most likely incorrectly sync to TLB flush fences on the kernel queues now and
> that could not only have negative performance side effects but also hide real bugs
> we need to fix ASAP.
Maybe there're some bookkeeping fences that miss explicit sync when the driver doesn't sync them implicitly. I will further investigate it as well.
If we need to revert the patch as a workaround, we need the following https://www.spinics.net/lists/amd-gfx/msg123346.html to handle the eviction fence leakage issue.
Thanks,
Prike
> Regards,
> Christian.
>
> >
> > Regards
> > Sunil Khatri
> >
> > It seems to be the right thing TODO and we are still in the process of hammering
> out all the bugs for user queues.
> >
> > Regards,
> > Christian.
> >
> >> Also when we set user_queue=1, which enable both user and kernel
> >> submissions the userqueue tests in IGT also fail.
> >>
> >> Signed-off-by: Sunil Khatri <sunil.khatri at amd.com>
> >> ---
> >> drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 5 +++--
> >> 1 file changed, 3 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> >> index d6ae9974c952..5576ed0b508f 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> >> @@ -249,8 +249,9 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
> >> struct amdgpu_sync *sync,
> >>
> >> if (resv == NULL)
> >> return -EINVAL;
> >> - /* Implicitly sync only to KERNEL, WRITE and READ */
> >> - dma_resv_for_each_fence(&cursor, resv, DMA_RESV_USAGE_READ, f) {
> >> +
> >> + /* TODO: Use DMA_RESV_USAGE_READ here */
> >> + dma_resv_for_each_fence(&cursor, resv,
> DMA_RESV_USAGE_BOOKKEEP,
> >> + f) {
> >> dma_fence_chain_for_each(f, f) {
> >> struct dma_fence *tmp =
> >> dma_fence_chain_contained(f);
> >>
> >
More information about the amd-gfx
mailing list