[PATCH v2] drm/amdgpu: fix system hang issue during GPU reset
Paul Menzel
pmenzel+amd-gfx at molgen.mpg.de
Mon Jul 13 13:10:00 UTC 2020
Dear Dennis,
Am 10.07.20 um 10:39 schrieb Li, Dennis:
> I used our internal tool to make GPU hang and do stress test.
Interesting. I want to have such a tool. ;-)
So you noticed it during testing with that tool, and not by somebody
experiencing this in production?
> In kernel, when GPU hang, driver has multi-paths to enter
> amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset is used
> to avoid re-entering GPU recovery. During GPU reset and resume, it
> is unsafe that other threads access GPU, which maybe cause GPU reset
> failed. Therefore the new rw_semaphore adev->reset_sem is
> introduced, which protect GPU from being accessed by external
> threads when doing recovery.
Thank you for the explanation. It’d be great if you added this to the
commit message.
Kind regards,
Paul
More information about the amd-gfx
mailing list