[PATCH] drm/amdgpu: Fix two reset triggered in a row
Li, Yunxiang (Teddy)
Yunxiang.Li at amd.com
Tue Apr 23 03:13:26 UTC 2024
[Public]
> We can't do this technically as there are cases where we skip full device reset (even then amdgpu_in_reset will return true). The better thing to do is to move amdgpu_device_stop_pending_resets() later in
> gpu_recover()- if a device has undergone full reset, then cancel all pending resets. Presently it's happening earlier which could be why this issue is seen.
This sounds like it is a design issue then, if different reset workers expect different resets to be triggered but they all use the same flag. I wonder if the other places that check this flags are correct. FWIW I was testing with SRIOV where it always does full reset and ran into this issue.
More information about the amd-gfx
mailing list