[PATCH v2] drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs
Luben Tuikov
luben.tuikov at amd.com
Thu Mar 16 03:41:09 UTC 2023
On 2023-03-08 21:27, YuBiao Wang wrote:
> v2: Add comments to clarify in the code.
>
> [Why]
> For engines not supporting soft reset, i.e. VCN, there will be a failed
> ib test before mode 1 reset during asic reset. The fences in this case
> are never signaled and next time when we try to free the sa_bo, kernel
> will hang.
>
> [How]
> During pre_asic_reset, driver will clear job fences and afterwards the
> fences' refcount will be reduced to 1. For drm_sched_jobs it will be
> released in job_free_cb, and for non-sched jobs like ib_test, it's meant
> to be released in sa_bo_free but only when the fences are signaled. So
> we have to force signal the non_sched bad job's fence during
> pre_asic_reset or the clear is not complete.
>
> Signed-off-by: YuBiao Wang <YuBiao.Wang at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index faff4a3f96e6..ad7c5b70c35a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -673,6 +673,7 @@ void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring)
> {
> int i;
> struct dma_fence *old, **ptr;
> + struct amdgpu_job *job;
>
> for (i = 0; i <= ring->fence_drv.num_fences_mask; i++) {
> ptr = &ring->fence_drv.fences[i];
> @@ -680,6 +681,13 @@ void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring)
> if (old && old->ops == &amdgpu_job_fence_ops) {
> RCU_INIT_POINTER(*ptr, NULL);
> dma_fence_put(old);
> + /* For non-sched bad job, i.e. failed ib test, we need to force
> + * signal it right here or we won't be able to track them in fence drv
> + * and they will remain unsignaled during sa_bo free.
> + */
> + job = container_of(old, struct amdgpu_job, hw_fence);
> + if (!job->base.s_fence && !dma_fence_is_signaled(old))
> + dma_fence_signal(old);
Hi YuBiao,
Thanks for adding the clarifying comments and sending a v2 of this patch.
Perhaps move the chunk you're adding, to sit before, rather than after,
the statements of the if-conditional. Also move the "job" variable
declaration to be inside the if-conditional, since it is not used
by anything outside it. Something like this, (note a few small fixes to the comment),
if (old && old->ops == &amdgpu_job_fence_ops) {
struct amdgpu_job *job;
/* For non-scheduler bad job, i.e. failed IB test, we need to signal
* it right here or we won't be able to track them in fence_drv
* and they will remain unsignaled during sa_bo free.
*/
job = container_of(old, struct amdgpu_job, hw_fence);
if (!job->base.s_fence && !dma_fence_is_signaled(old))
dma_fence_signal(old);
RCU_INIT_POINTER(*ptr, NULL);
dma_fence_put(old);
}
Then, give it a test.
With that change, and upon successful test results, this patch is,
Acked-by: Luben Tuikov <luben.tuikov at amd.com>
--
Regards,
Luben
More information about the amd-gfx
mailing list