[PATCH] drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs
Luben Tuikov
luben.tuikov at amd.com
Tue Mar 7 20:36:27 UTC 2023
Hi,
Thanks for your patch!
On 2023-03-07 02:07, Christian König wrote:
> Am 07.03.23 um 08:02 schrieb YuBiao Wang:
>> [Why]
>> For engines not supporting soft reset, i.e. VCN, there will be a failed
>> ib test before mode 1 reset during asic reset. The fences in this case
>> are never signaled and next time when we try to free the sa_bo, kernel
>> will hang.
>>
>> [How]
>> During pre_asic_reset, driver will clear job fences and afterwards the
>> fences' refcount will be reduced to 1. For drm_sched_jobs it will be
>> released in job_free_cb, and for non-sched jobs like ib_test, it's meant
>> to be released in sa_bo_free but only when the fences are signaled. So
So, you're missing a signal for the non-scheduler job fences?
>> we have to force signal the non_sched bad job's fence during
>> pre_asic_reset or the clear is not complete.
Do you want to add a function which does just this (signals
non-scheduler job fences) in amdgpu_device_pre_asic_reset(),
and resubmit your patch? (There will be code redundancy, but may
make the point clearer.)
Are we missing to signal non-scheduler job fences on reset altogether?
--
Regards,
Luben
>
> Well NAK for now. It looks once more like one of those not very well
> thought through changes.
>
> Luben can you please take a look at this and double check it>
> Thanks,
> Christian.
>
>>
>> Signed-off-by: YuBiao Wang <YuBiao.Wang at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index faff4a3f96e6..2e549bd50990 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -673,6 +673,7 @@ void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring)
>> {
>> int i;
>> struct dma_fence *old, **ptr;
>> + struct amdgpu_job *job;
>>
>> for (i = 0; i <= ring->fence_drv.num_fences_mask; i++) {
>> ptr = &ring->fence_drv.fences[i];
>> @@ -680,6 +681,9 @@ void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring)
>> if (old && old->ops == &amdgpu_job_fence_ops) {
>> RCU_INIT_POINTER(*ptr, NULL);
>> dma_fence_put(old);
>> + job = container_of(old, struct amdgpu_job, hw_fence);
>> + if (!job->base.s_fence && !dma_fence_is_signaled(old))
>> + dma_fence_signal(old);
>> }
>> }
>> }
>
More information about the amd-gfx
mailing list