[PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

Christian König christian.koenig at amd.com
Wed May 29 13:55:59 UTC 2024


Am 29.05.24 um 15:44 schrieb Li, Yunxiang (Teddy):
> [AMD Official Use Only - AMD Internal Distribution Only]
>
>> I don't think trying to add some reset handling here makes sense in the first place.
>> Part of the reset/recovery procedure is to signal all fence and that includes the one we are waiting for here.
>> So this wait should return immediately in a reset anyway.
> As far as I can tell, these fence_ptr s that get polled are not packaged into a fence obj, and in practice I see 10s of seconds wait before these timeout and reset can begin. Also after reset there is often a long wait, up to 2 minutes, for all the tlb_fence_work to timeout (not addressed by this patch, still haven't figure out what's going on there)

The problem is that we don't force complete the non scheduler rings, 
e.g. MES, KIQ etc...

Try to remove this check here from the loop in 
amdgpu_device_pre_asic_reset():

                 if (!amdgpu_ring_sched_ready(ring))
                         continue;

Regards,
Christian.


>
> Teddy



More information about the amd-gfx mailing list