[PATCH] drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs

Christian König christian.koenig at amd.com
Tue Mar 7 07:07:57 UTC 2023


Am 07.03.23 um 08:02 schrieb YuBiao Wang:
> [Why]
> For engines not supporting soft reset, i.e. VCN, there will be a failed
> ib test before mode 1 reset during asic reset. The fences in this case
> are never signaled and next time when we try to free the sa_bo, kernel
> will hang.
>
> [How]
> During pre_asic_reset, driver will clear job fences and afterwards the
> fences' refcount will be reduced to 1. For drm_sched_jobs it will be
> released in job_free_cb, and for non-sched jobs like ib_test, it's meant
> to be released in sa_bo_free but only when the fences are signaled. So
> we have to force signal the non_sched bad job's fence during
> pre_asic_reset or the clear is not complete.

Well NAK for now. It looks once more like one of those not very well 
thought through changes.

Luben can you please take a look at this and double check it.

Thanks,
Christian.

>
> Signed-off-by: YuBiao Wang <YuBiao.Wang at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index faff4a3f96e6..2e549bd50990 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -673,6 +673,7 @@ void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring)
>   {
>   	int i;
>   	struct dma_fence *old, **ptr;
> +	struct amdgpu_job *job;
>   
>   	for (i = 0; i <= ring->fence_drv.num_fences_mask; i++) {
>   		ptr = &ring->fence_drv.fences[i];
> @@ -680,6 +681,9 @@ void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring)
>   		if (old && old->ops == &amdgpu_job_fence_ops) {
>   			RCU_INIT_POINTER(*ptr, NULL);
>   			dma_fence_put(old);
> +			job = container_of(old, struct amdgpu_job, hw_fence);
> +			if (!job->base.s_fence && !dma_fence_is_signaled(old))
> +				dma_fence_signal(old);
>   		}
>   	}
>   }



More information about the amd-gfx mailing list