[PATCH] drm/amdgpu: Fix desktop freezed after gpu-reset

Liu, HaoPing (Alan) HaoPing.liu at amd.com
Mon Apr 17 02:18:26 UTC 2023


Hi André

Thanks for your comment, please see inline.

On 2023/4/15 上午 05:11, André Almeida wrote:
> Hi Alan,
>
> Em 14/04/2023 13:22, Alan Liu escreveu:
>> [Why]
>> After gpu-reset, sometimes the driver would fail to enable vblank irq,
>> causing flip_done timed out and the desktop freezed.
>>
>> During gpu-reset, we will disable and enable vblank irq in dm_suspend()
>> and dm_resume(). Later on in amdgpu_irq_gpu_reset_resume_helper(), we
>> will check irqs' refcount and decide to enable or disable the irqs
>> again.
>>
>> However, we have 2 sets of API for controling vblank irq, one is
>> dm_vblank_get/put() and another is amdgpu_irq_get/put(). Each API has
>> its own refcount and flag to store the state of vblank irq, and they
>> are not synchronized.
>>
>> In drm we use the first API to control vblank irq but in
>> amdgpu_irq_gpu_reset_resume_helper() we use the second set of API.
>>
>> The failure happens when vblank irq was enabled by dm_vblank_get()
>> before gpu-reset, we have vblank->enabled true. However, during
>> gpu-reset, in amdgpu_irq_gpu_reset_resume_helper(), vblank irq's state
>> checked from amdgpu_irq_update() is DISABLED. So finally it will disable
>> vblank irq again. After gpu-reset, if there is a cursor plane commit,
>> the driver will try to enable vblank irq by calling drm_vblank_enable(),
>> but the vblank->enabled is still true, so it fails to turn on vblank
>> irq and causes flip_done can't be completed in vblank irq handler and
>> desktop become freezed.
>>
>> [How]
>> Combining the 2 vblank control APIs by letting drm's API finally calls
>> amdgpu_irq's API, so the irq's refcount and state of both APIs can be
>> synchronized. Also add a check to prevent refcount from being less then
>> 0 in amdgpu_irq_put().
>>
>
> How have you tested this patch?
>

I triggered gpu-reset by this command: sudo cat 
/sys/kernel/debug/dri/0/amdgpu_gpu_recover

When display lights up after gpu-reset, the desktop becomes freezed once 
I move the cursor (sometimes you need to retrigger gpu-reset for 1 or 2 
more times to reproduce it).

I made this patch after finding the root cause, and didn't find the 
display freezed after testing for about 20 times.

>> v2:
>> - Add warning in amdgpu_irq_enable() if the irq is already disabled.
>> - Call dc_interrupt_set() in dm_set_vblank() to avoid refcount change
>>    if it is in gpu-reset.
>>
>
> If this is a v2, please use [PATCH v2] in the subject.


Thanks for the reminding. I will keep it in mind.


Best Regards,

Alan


>
> Thanks,
>     André


More information about the amd-gfx mailing list