<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:PMingLiU;
panose-1:2 1 6 1 0 1 1 1 1 1;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@PMingLiU";
panose-1:2 1 6 1 0 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:"Calibri",sans-serif;}
p.msipheaderdf3d92d6, li.msipheaderdf3d92d6, div.msipheaderdf3d92d6
{mso-style-name:msipheaderdf3d92d6;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle21
{mso-style-type:personal-compose;
font-family:"Arial",sans-serif;
color:blue;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="msipheaderdf3d92d6" style="margin:0cm"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:blue">[AMD Official Use Only - General]</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoPlainText">Hi Christian,<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Thanks for the review. Please see <span style="color:#00B050">
inline</span>.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Best Regards,<o:p></o:p></p>
<p class="MsoPlainText">Alan<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">-----Original Message-----<br>
From: Christian König <ckoenig.leichtzumerken@gmail.com> <br>
Sent: Tuesday, March 28, 2023 7:16 PM<br>
To: Liu, HaoPing (Alan) <HaoPing.Liu@amd.com>; amd-gfx@lists.freedesktop.org<br>
Cc: Lakha, Bhawanpreet <Bhawanpreet.Lakha@amd.com><br>
Subject: Re: [PATCH] drm/amdgpu: Fix desktop freezed after gpu-reset<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Am 27.03.23 um 17:20 schrieb Alan Liu:<o:p></o:p></p>
<p class="MsoPlainText">> [Why]<o:p></o:p></p>
<p class="MsoPlainText">> After gpu-reset, sometimes the driver would fail to enable vblank irq,
<o:p></o:p></p>
<p class="MsoPlainText">> causing flip_done timed out and the desktop freezed.<o:p></o:p></p>
<p class="MsoPlainText">><o:p> </o:p></p>
<p class="MsoPlainText">> During gpu-reset, we will disable and enable vblank irq in
<o:p></o:p></p>
<p class="MsoPlainText">> dm_suspend() and dm_resume(). Later on in <o:p></o:p></p>
<p class="MsoPlainText">> amdgpu_irq_gpu_reset_resume_helper(), we will check irqs' refcount and decide to enable or disable the irqs again.<o:p></o:p></p>
<p class="MsoPlainText">><o:p> </o:p></p>
<p class="MsoPlainText">> However, we have 2 sets of API for controling vblank irq, one is<o:p></o:p></p>
<p class="MsoPlainText">> dm_vblank_get/put() and another is amdgpu_irq_get/put(). Each API has
<o:p></o:p></p>
<p class="MsoPlainText">> its own refcount and flag to store the state of vblank irq, and they
<o:p></o:p></p>
<p class="MsoPlainText">> are not synchronized.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">This is the source of the problem and you should address this instead.
<o:p></o:p></p>
<p class="MsoPlainText">The change you suggested below would break in some use cases.<o:p></o:p></p>
<p class="MsoPlainText"><span style="color:black"><o:p> </o:p></span></p>
<p class="MsoPlainText"><span style="color:#00B050">In struct drm_vblank_crtc we have a vblank irq refcount controlled by drm layer, and in struct amdgpu_irq_src we have enabled_types as refcount for each irq controlled by the dm.<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:#00B050">I think the best solution will be to get rid of the refcount in drm and only maintain the dm one, and add a crtc function for the drm layer to get the refcount/state of vblank.<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:#00B050">But this may be dangerous since it’s a change in drm layer. Do you have any comments?
<o:p></o:p></span></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">><o:p> </o:p></p>
<p class="MsoPlainText">> In drm we use the first API to control vblank irq but in<o:p></o:p></p>
<p class="MsoPlainText">> amdgpu_irq_gpu_reset_resume_helper() we use the second set of API.<o:p></o:p></p>
<p class="MsoPlainText">><o:p> </o:p></p>
<p class="MsoPlainText">> The failure happens when vblank irq was enabled by dm_vblank_get()
<o:p></o:p></p>
<p class="MsoPlainText">> before gpu-reset, we have vblank->enabled true. However, during
<o:p></o:p></p>
<p class="MsoPlainText">> gpu-reset, in amdgpu_irq_gpu_reset_resume_helper(), vblank irq's state
<o:p></o:p></p>
<p class="MsoPlainText">> checked from<o:p></o:p></p>
<p class="MsoPlainText">> amdgpu_irq_update() is DISABLED. So finally it will disable vblank irq
<o:p></o:p></p>
<p class="MsoPlainText">> again. After gpu-reset, if there is a cursor plane commit, the driver
<o:p></o:p></p>
<p class="MsoPlainText">> will try to enable vblank irq by calling drm_vblank_enable(), but the<o:p></o:p></p>
<p class="MsoPlainText">> vblank->enabled is still true, so it fails to turn on vblank irq and<o:p></o:p></p>
<p class="MsoPlainText">> causes flip_done can't be completed in vblank irq handler and desktop
<o:p></o:p></p>
<p class="MsoPlainText">> become freezed.<o:p></o:p></p>
<p class="MsoPlainText">><o:p> </o:p></p>
<p class="MsoPlainText">> [How]<o:p></o:p></p>
<p class="MsoPlainText">> Combining the 2 vblank control APIs by letting drm's API finally calls
<o:p></o:p></p>
<p class="MsoPlainText">> amdgpu_irq's API, so the irq's refcount and state of both APIs can be
<o:p></o:p></p>
<p class="MsoPlainText">> synchronized. Also add a check to prevent refcount from being less
<o:p></o:p></p>
<p class="MsoPlainText">> then<o:p></o:p></p>
<p class="MsoPlainText">> 0 in amdgpu_irq_put().<o:p></o:p></p>
<p class="MsoPlainText">><o:p> </o:p></p>
<p class="MsoPlainText">> Signed-off-by: Alan Liu <<a href="mailto:HaoPing.Liu@amd.com"><span style="color:windowtext;text-decoration:none">HaoPing.Liu@amd.com</span></a>><o:p></o:p></p>
<p class="MsoPlainText">> ---<o:p></o:p></p>
<p class="MsoPlainText">> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 3 +++<o:p></o:p></p>
<p class="MsoPlainText">> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c | 14 ++++++++++----<o:p></o:p></p>
<p class="MsoPlainText">> 2 files changed, 13 insertions(+), 4 deletions(-)<o:p></o:p></p>
<p class="MsoPlainText">><o:p> </o:p></p>
<p class="MsoPlainText">> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c <o:p>
</o:p></p>
<p class="MsoPlainText">> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c<o:p></o:p></p>
<p class="MsoPlainText">> index a6aef488a822..1b66003657e2 100644<o:p></o:p></p>
<p class="MsoPlainText">> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c<o:p></o:p></p>
<p class="MsoPlainText">> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c<o:p></o:p></p>
<p class="MsoPlainText">> @@ -597,6 +597,9 @@ int amdgpu_irq_put(struct amdgpu_device *adev, struct amdgpu_irq_src *src,<o:p></o:p></p>
<p class="MsoPlainText">> if (!src->enabled_types || !src->funcs->set)<o:p></o:p></p>
<p class="MsoPlainText">> return -EINVAL;<o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText">> + if (!amdgpu_irq_enabled(adev, src, type))<o:p></o:p></p>
<p class="MsoPlainText">> + return 0;<o:p></o:p></p>
<p class="MsoPlainText">> +<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">That is racy and won't work. The intention of amdgpu_irq_update() is to always update the irq state, no matter what the status is.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText"><span style="color:#00B050">This is a change to amdgpu_irq_put() to prevent the refcount from being cut to less than 0. Does it break the intention of amdgpu_irq_update()?<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"><o:p> </o:p></span></p>
<p class="MsoPlainText">Regards,<o:p></o:p></p>
<p class="MsoPlainText">Christian.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">> if (atomic_dec_and_test(&src->enabled_types[type]))<o:p></o:p></p>
<p class="MsoPlainText">> return amdgpu_irq_update(adev, src, type);<o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText">> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
<o:p></o:p></p>
<p class="MsoPlainText">> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c<o:p></o:p></p>
<p class="MsoPlainText">> index dc4f37240beb..e04f846b0b19 100644<o:p></o:p></p>
<p class="MsoPlainText">> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c<o:p></o:p></p>
<p class="MsoPlainText">> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c<o:p></o:p></p>
<p class="MsoPlainText">> @@ -146,7 +146,7 @@ static void vblank_control_worker(struct
<o:p></o:p></p>
<p class="MsoPlainText">> work_struct *work)<o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText">> static inline int dm_set_vblank(struct drm_crtc *crtc, bool enable)<o:p></o:p></p>
<p class="MsoPlainText">> {<o:p></o:p></p>
<p class="MsoPlainText">> - enum dc_irq_source irq_source;<o:p></o:p></p>
<p class="MsoPlainText">> + int irq_type;<o:p></o:p></p>
<p class="MsoPlainText">> struct amdgpu_crtc *acrtc = to_amdgpu_crtc(crtc);<o:p></o:p></p>
<p class="MsoPlainText">> struct amdgpu_device *adev = drm_to_adev(crtc->dev);<o:p></o:p></p>
<p class="MsoPlainText">> struct dm_crtc_state *acrtc_state = to_dm_crtc_state(crtc->state);
<o:p></o:p></p>
<p class="MsoPlainText">> @@ -169,10 +169,16 @@ static inline int dm_set_vblank(struct drm_crtc *crtc, bool enable)<o:p></o:p></p>
<p class="MsoPlainText">> if (rc)<o:p></o:p></p>
<p class="MsoPlainText">> return rc;<o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText">> - irq_source = IRQ_TYPE_VBLANK + acrtc->otg_inst;<o:p></o:p></p>
<p class="MsoPlainText">> + irq_type = amdgpu_display_crtc_idx_to_irq_type(adev,
<o:p></o:p></p>
<p class="MsoPlainText">> +acrtc->crtc_id);<o:p></o:p></p>
<p class="MsoPlainText">> +<o:p></o:p></p>
<p class="MsoPlainText">> + if (enable)<o:p></o:p></p>
<p class="MsoPlainText">> + rc = amdgpu_irq_get(adev, &adev->crtc_irq, irq_type);<o:p></o:p></p>
<p class="MsoPlainText">> +<o:p></o:p></p>
<p class="MsoPlainText">> + else<o:p></o:p></p>
<p class="MsoPlainText">> + rc = amdgpu_irq_put(adev, &adev->crtc_irq, irq_type);<o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText">> - if (!dc_interrupt_set(adev->dm.dc, irq_source, enable))<o:p></o:p></p>
<p class="MsoPlainText">> - return -EBUSY;<o:p></o:p></p>
<p class="MsoPlainText">> + if (rc)<o:p></o:p></p>
<p class="MsoPlainText">> + return rc;<o:p></o:p></p>
<p class="MsoPlainText">> <o:p></o:p></p>
<p class="MsoPlainText">> skip:<o:p></o:p></p>
<p class="MsoPlainText">> if (amdgpu_in_reset(adev))<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
</div>
</body>
</html>