[PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

Christian König ckoenig.leichtzumerken at gmail.com
Wed May 29 15:19:20 UTC 2024


Am 29.05.24 um 16:48 schrieb Li, Yunxiang (Teddy):
> [AMD Official Use Only - AMD Internal Distribution Only]
>
>> Yeah, I know. That's one of the reason I've pointed out on the patch adding
>> that that this behavior is actually completely broken.
>>
>> If you run into issues with the MES because of this then please suggest a
>> revert of that patch.
> I think it just need to be improved to allow this force-signal behavior. The current behavior is slow/inconvenient, but the old behavior is wrong. Since MES will continue process submissions even when one submission failed. So with just one fence location there's no way to tell if a command failed or not.

No the MES behavior is broken. When a submission failed it should stop 
processing or signal that the operation didn't completed through some 
other mechanism.

Just not writing the fence and continuing results in tons of problems, 
from the TLB fence all the way to the ring buffer and reset handling.

This is a hard requirement and really can't be changed.

Regards,
Christian.


More information about the amd-gfx mailing list