[PATCH] drm/amdgpu: Fix desktop freezed after gpu-reset

Christian König ckoenig.leichtzumerken at gmail.com
Fri Mar 31 06:39:20 UTC 2023


Hi Luben,

sorry, looks like I haven't seen your reply in my inbox at the time I've 
wrote my reply.

That seems to happen quite often with freedesktop.org list recently and 
I'm wondering if there's something wrong with my setup or the list server.

Christian.

Am 30.03.23 um 14:27 schrieb Luben Tuikov:
> Hi Alan,
>
> I'll comment in the other thread, as it seems Christian commented directly to your
> patch the day after my comment, rather than following up with my email sent the previous
> day and we now have two divergent threads where you post two identical comments, and it shouldn't
> be like that--we should have one thread only.
>
> Regards,
> Luben
>
> On 2023-03-30 04:59, Liu, HaoPing (Alan) wrote:
>> [AMD Official Use Only - General]
>>
>>
>> Hi, Luben
>>
>>   
>>
>> Thanks for the review. Please see inline.
>>
>>   
>>
>> Best Regards,
>>
>> Alan
>>
>>   
>>
>> -----Original Message-----
>> From: Tuikov, Luben <Luben.Tuikov at amd.com>
>> Sent: Tuesday, March 28, 2023 3:00 AM
>> To: Liu, HaoPing (Alan) <HaoPing.Liu at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Lakha, Bhawanpreet <Bhawanpreet.Lakha at amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: Fix desktop freezed after gpu-reset
>>
>>   
>>
>> Hi,
>>
>>   
>>
>> That's a good fix. Some questions and comments below:
>>
>>   
>>
>> On 2023-03-27 11:20, Alan Liu wrote:
>>
>>> [Why]
>>> After gpu-reset, sometimes the driver would fail to enable vblank irq,
>>> causing flip_done timed out and the desktop freezed.
>>> During gpu-reset, we will disable and enable vblank irq in
>>> dm_suspend() and dm_resume(). Later on in
>>> amdgpu_irq_gpu_reset_resume_helper(), we will check irqs' refcount and decide to enable or disable the irqs again.
>>> However, we have 2 sets of API for controling vblank irq, one is
>>> dm_vblank_get/put() and another is amdgpu_irq_get/put(). Each API has
>>> its own refcount and flag to store the state of vblank irq, and they
>>> are not synchronized.
>>   
>>
>> Is it possible to reconcile controlling VBlank IRQ to a single refcount?
>>
>>   
>>
>> In struct drm_vblank_crtc, we have “enabled” and “refcount” to store vblank irq state, and in struct amdgpu_irq_src we have “enabled_types” as the refcount for each irq in dm layer.
>>
>> To reconcile vblank irq to a single refcount, my idea is to remove enabled and refcount from struct drm_vblank_crtc, and add a callback function like vblank_irq_enabled() to drm_crtc_funcs.
>>
>> Drm layer can use this function to check the state or refcount of vblank irq from dm layer. But it may be dangerous because it is a change to drm layer. Do you have any comments?
>>
>>   
>>
>>> In drm we use the first API to control vblank irq but in
>>> amdgpu_irq_gpu_reset_resume_helper() we use the second set of API.
>>> The failure happens when vblank irq was enabled by dm_vblank_get()
>>> before gpu-reset, we have vblank->enabled true. However, during
>>> gpu-reset, in amdgpu_irq_gpu_reset_resume_helper(), vblank irq's state
>>> checked from
>>> amdgpu_irq_update() is DISABLED. So finally it will disable vblank irq
>>> again. After gpu-reset, if there is a cursor plane commit, the driver
>>> will try to enable vblank irq by calling drm_vblank_enable(), but the
>>> vblank->enabled is still true, so it fails to turn on vblank irq and
>>> causes flip_done can't be completed in vblank irq handler and desktop
>>> become freezed.
>>> [How]
>>> Combining the 2 vblank control APIs by letting drm's API finally calls
>>> amdgpu_irq's API, so the irq's refcount and state of both APIs can be
>>> synchronized. Also add a check to prevent refcount from being less
>>> then
>>> 0 in amdgpu_irq_put().
>>> Signed-off-by: Alan Liu <HaoPing.Liu at amd.com <mailto:HaoPing.Liu at amd.com>>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c            |  3 +++
>>>    .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c | 14
>>> ++++++++++----
>>>    2 files changed, 13 insertions(+), 4 deletions(-)
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>> index a6aef488a822..1b66003657e2 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>> @@ -597,6 +597,9 @@ int amdgpu_irq_put(struct amdgpu_device *adev, struct amdgpu_irq_src *src,
>>>              if (!src->enabled_types || !src->funcs->set)
>>>                             return -EINVAL;
>>>   
>>> +         if (!amdgpu_irq_enabled(adev, src, type))
>>> +                       return 0;
>>> +
>>>              if (atomic_dec_and_test(&src->enabled_types[type]))
>>>                             return amdgpu_irq_update(adev, src, type);
>>>   
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
>>> index dc4f37240beb..e04f846b0b19 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
>>> @@ -146,7 +146,7 @@ static void vblank_control_worker(struct
>>> work_struct *work)
>>>   
>>>   static inline int dm_set_vblank(struct drm_crtc *crtc, bool enable)
>>> {
>>> -          enum dc_irq_source irq_source;
>>> +         int irq_type;
>>>              struct amdgpu_crtc *acrtc = to_amdgpu_crtc(crtc);
>>>              struct amdgpu_device *adev = drm_to_adev(crtc->dev);
>>>              struct dm_crtc_state *acrtc_state = to_dm_crtc_state(crtc->state);
>>> @@ -169,10 +169,16 @@ static inline int dm_set_vblank(struct drm_crtc *crtc, bool enable)
>>>              if (rc)
>>>                             return rc;
>>>   
>>> -          irq_source = IRQ_TYPE_VBLANK + acrtc->otg_inst;
>>> +         irq_type = amdgpu_display_crtc_idx_to_irq_type(adev,
>>> +acrtc->crtc_id);
>>> +
>>> +         if (enable)
>>> +                       rc = amdgpu_irq_get(adev, &adev->crtc_irq, irq_type);
>>> +
>>> +         else
>>   
>>
>> There's an unnecessary empty line before the "else". It's a good idea to run patches through scripts/checkpatch.pl.
>>
>>   
>>
>> Thanks, will use the tool next time.
>>
>>   
>>
>>> +                       rc = amdgpu_irq_put(adev, &adev->crtc_irq, irq_type);
>>>   
>>> -          if (!dc_interrupt_set(adev->dm.dc, irq_source, enable))
>>> -                        return -EBUSY;
>>> +         if (rc)
>>> +                       return rc;
>>>   
>>>    skip:
>>>              if (amdgpu_in_reset(adev))
>>   
>>
>> --
>>
>> Regards,
>>
>> Luben
>>
>>   
>>



More information about the amd-gfx mailing list