[PATCH v2] drm/amdgpu: fix system hang issue during GPU reset

Paul Menzel pmenzel+amd-gfx at molgen.mpg.de
Mon Jul 13 13:10:00 UTC 2020


Dear Dennis,


Am 10.07.20 um 10:39 schrieb Li, Dennis:

> I used our internal tool to make GPU hang and do stress test.

Interesting. I want to have such a tool. ;-)

So you noticed it during testing with that tool, and not by somebody 
experiencing this in production?

> In kernel, when GPU hang, driver has multi-paths to enter 
> amdgpu_device_gpu_recover, the  atomic  adev->in_gpu_reset is used
> to avoid re-entering GPU recovery. During GPU reset and resume, it
> is unsafe that other threads access GPU, which maybe cause GPU reset 
> failed. Therefore the new rw_semaphore  adev->reset_sem is 
> introduced, which protect GPU from being accessed by external
> threads when doing recovery.

Thank you for the explanation. It’d be great if you added this to the
commit message.


Kind regards,

Paul


More information about the amd-gfx mailing list