[PATCH 4/4] drm/amdgpu: prevent gpu access during reset recovery

Li, Yunxiang (Teddy) Yunxiang.Li at amd.com
Thu May 23 15:35:07 UTC 2024


[Public]

> > Here is taking a different lock than the reset_domain->sem. It is a seperate reset_domain->gpu_sem that is only locked when we will actuall do reset, it is not taken in the skip_hw_reset path.
>
> Exactly that is what you should *not* do. Please don't add any new lock to the code. This is already complicated enough.
>
> If you think that adding wrappers for reset lock makes sense then we can probably do that, bot don't add any lock for hw access.

The two lock protects very different things though. The first case is we need to block two resets running in parallel, this does not only cover GPU reset itself but also any teardown that happens before GPU reset. The second case is we need to ensure exclusive access to the GPU between GPU reset and GPU init, concurrent access is fine before GPU is reset.

Theoretically, the second case happens within the first case, so locking the first case would protect against both. But with the current implementation this is infeasible, all the generic functions called between amdgpu_device_lock/unlock_reset_domain would need to be swapped out for special versions so the reset thread does not dead lock itself. It is much simpler to have a second, much narrower lock that only covers GPU reset<->GPU init because all the accesses there are very low level anyway.

Teddy


More information about the amd-gfx mailing list