[PATCH] Revert "drm/amdgpu: promote the implicit sync to the dependent read fences"

Christian König christian.koenig at amd.com
Tue May 27 12:23:33 UTC 2025


On 5/27/25 13:27, Khatri, Sunil wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
> 
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig at amd.com>
> Sent: Tuesday, May 27, 2025 2:32 PM
> To: Khatri, Sunil <Sunil.Khatri at amd.com>; amd-gfx at lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher at amd.com>
> Subject: Re: [PATCH] Revert "drm/amdgpu: promote the implicit sync to the dependent read fences"
> 
> On 5/27/25 10:58, Sunil Khatri wrote:
>> This reverts commit 714bbbf20a7266e48632fab466563e695af9acb5.
>> bisected to this change which is causing the flikering issue in the UI
>> for various apps like glxgears and unigen heaven.
> 
> Is that flickering also there when using kernel queues?
> 
> If not then without an explanation where that flickering is coming from for user queues I have to reject that.
> 
> There is more to just flickering here. We are planning to use amdgpu.user_queue=1 in our CI testing.  That means both kernel and user submission are enabled. When I ran the testing amdgpu_basic tests which runs both kernel and user submissions, all the user queue tests are consistently failing. With this revert change in place all those issues are fixed, Both user and kernel queue tests pass consistently without any artifacts.

That sounds like we are missing some dependency and that works only by coincident.

> The reason of why this is helping is I am not sure of but it’s the observation which was observed earlier by Arvind too. We could investigate the probable reason of it infact Arvind is checking that but to enable CI we could revert this if you agree.

I think figuring out what is missing here is more important than enabling CI.

We most likely incorrectly sync to TLB flush fences on the kernel queues now and that could not only have negative performance side effects but also hide real bugs we need to fix ASAP.

Regards,
Christian.

> 
> Regards
> Sunil Khatri
> 
> It seems to be the right thing TODO and we are still in the process of hammering out all the bugs for user queues.
> 
> Regards,
> Christian.
> 
>> Also when we set user_queue=1, which enable both user and kernel
>> submissions the userqueue tests in IGT also fail.
>>
>> Signed-off-by: Sunil Khatri <sunil.khatri at amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> index d6ae9974c952..5576ed0b508f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> @@ -249,8 +249,9 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
>> struct amdgpu_sync *sync,
>>
>>       if (resv == NULL)
>>               return -EINVAL;
>> -     /* Implicitly sync only to KERNEL, WRITE and READ */
>> -     dma_resv_for_each_fence(&cursor, resv, DMA_RESV_USAGE_READ, f) {
>> +
>> +     /* TODO: Use DMA_RESV_USAGE_READ here */
>> +     dma_resv_for_each_fence(&cursor, resv, DMA_RESV_USAGE_BOOKKEEP, f) {
>>               dma_fence_chain_for_each(f, f) {
>>                       struct dma_fence *tmp = dma_fence_chain_contained(f);
>>
> 



More information about the amd-gfx mailing list