[PATCH] drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs

Wang, YuBiao YuBiao.Wang at amd.com
Wed Mar 8 03:08:34 UTC 2023


Hi Luben,

I was comparing the bad jobs of failed ib test and the ones that causes the TDR, and I think the main difference is whether it is submitted via drm_sched or not. In simple test cases it doesn't seem to incorrectly signal the fences that shouldn't be signaled. We indeed may need more heavier tests but so far based on static analyze I think I didn't notice the case you mentioned. There's another case using direct job submission during resete, but it happens in recover_vram which happens after the pre_asic reset so I think it won’t be affected.

I'll move this lines into a new function as you suggested and resent a v2 patch.

Regards,
Yubiao Wang

-----Original Message-----
From: Tuikov, Luben <Luben.Tuikov at amd.com> 
Sent: Wednesday, March 8, 2023 7:22 AM
To: Koenig, Christian <Christian.Koenig at amd.com>; Wang, YuBiao <YuBiao.Wang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Chen, Horace <Horace.Chen at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>; Liu, Monk <Monk.Liu at amd.com>; Xu, Feifei <Feifei.Xu at amd.com>; Wang, Yang(Kevin) <KevinYang.Wang at amd.com>
Subject: Re: [PATCH] drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs

On 2023-03-07 15:36, Luben Tuikov wrote:
> +			job = container_of(old, struct amdgpu_job, hw_fence);
> +			if (!job->base.s_fence && !dma_fence_is_signaled(old))
> +				dma_fence_signal(old);

Thinking about this more, is !job->base.s_fence condition here enough to mean "non-sched jobs like ib_test"?

I feel that it is a bit overloaded here--could we have this condition satisfied,yet we can't willy-nilly signal the fence here?
--
Regards,
Luben



More information about the amd-gfx mailing list