<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Aptos;
panose-1:2 11 0 4 2 2 2 2 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
@font-face
{font-family:"\@DengXian";
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Aptos",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="en-CN" link="blue" vlink="purple" style="word-wrap:break-word">
<p style="font-family:Calibri;font-size:10pt;color:#0000FF;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left">
[AMD Official Use Only - AMD Internal Distribution Only]<br>
</p>
<br>
<div>
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Aptos",sans-serif">Hi
<a id="OWAAMA95337AF4FF1064F84FD15F1EAAF1561" href="mailto:HaiJun.Chang@amd.com">
<span style="font-family:"Aptos",sans-serif;text-decoration:none">@Chang, HaiJun</span></a><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Aptos",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Aptos",sans-serif">Thank you for the info.
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Aptos",sans-serif">You are right, GFXMSIX_VECT0_ADDR_LO and GFXMSIX_VECT0_CONTROL registers are not restored on resume with new VF.
<br>
<br>
</span><span lang="EN-US" style="font-family:Consolas">source VF, normal value<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Consolas">0x42000: 0xFEE00138 // GFXMSIX_VECT0_ADDR_LO<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Consolas">0x4200C: 0x00000000 // GFXMSIX_VECT0_CONTROL<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Consolas"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Consolas">destination VF, abnormal value<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Consolas">0x42000: 0x00000000 // GFXMSIX_VECT0_ADDR_LO<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Consolas">0x4200C: 0x00000001 // GFXMSIX_VECT0_CONTROL<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Aptos",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Aptos",sans-serif">Calling amdgpu_restore_msix() on resume can fix this issue. I will upload a new patch for this.<br>
<br>
</span><span lang="EN-US" style="font-family:Consolas">static int vega20_ih_resume(struct amdgpu_ip_block *ip_block)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Consolas">{<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Consolas">+ struct amdgpu_device *adev = ip_block->adev;<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Consolas">+<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Consolas">+ if (amdgpu_xmgi_is_node_changed(adev))<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Consolas">+ amdgpu_restore_msix(adev);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Consolas"> return vega20_ih_hw_init(ip_block);<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:Consolas">}<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Aptos",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Aptos",sans-serif">Thanks<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Aptos",sans-serif">Sam<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Aptos",sans-serif"><o:p> </o:p></span></p>
<div id="mail-editor-reference-message-container">
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black">From:
</span></b><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black">Chang, HaiJun <HaiJun.Chang@amd.com><br>
<b>Date: </b>Tuesday, April 29, 2025 at 10:43<br>
<b>To: </b>Koenig, Christian <Christian.Koenig@amd.com>, Zhang, GuoQing (Sam) <GuoQing.Zhang@amd.com>, Christian König <ckoenig.leichtzumerken@gmail.com>, amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>, Deucher, Alexander <Alexander.Deucher@amd.com><br>
<b>Cc: </b>Zhao, Victor <Victor.Zhao@amd.com>, Deng, Emily <Emily.Deng@amd.com>, Zhang, Owen(SRDC) <Owen.Zhang2@amd.com><br>
<b>Subject: </b>RE: [PATCH 6/6] drm/amdgpu: fix fence fallback timer expired error<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><a name="BM_BEGIN"></a><span style="font-family:"Times New Roman",serif">[AMD Official Use Only - AMD Internal Distribution Only]<br>
<br>
Hi,<br>
<br>
The interrupt issue loss issue might be VF msix table isn't restored properly on resume.<br>
<br>
The msix table in virtual machine is faked. The real msix table will be programmed by QEMU when guest enable/disable msix interrupt. But QEMU accessing VF msix table (GFXMSIX_* registers) could be blocked by nBIF protection if that time VF isn't in exclusive
access.<br>
We had a w/a in amdgpu driver to handle msix table loss case in amdgpu_restore_msix function.<br>
<br>
Can you check the values of these GFXMSIX_* registers? I think we should call amdgpu_restore_msix on resume to restore msix table.<br>
<br>
Thanks,<br>
HaiJun<br>
<br>
-----Original Message-----<br>
From: Koenig, Christian <Christian.Koenig@amd.com><br>
Sent: Monday, April 28, 2025 8:24 PM<br>
To: Zhang, GuoQing (Sam) <GuoQing.Zhang@amd.com>; Christian König <ckoenig.leichtzumerken@gmail.com>; amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com><br>
Cc: Zhao, Victor <Victor.Zhao@amd.com>; Chang, HaiJun <HaiJun.Chang@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Zhang, Owen(SRDC) <Owen.Zhang2@amd.com><br>
Subject: Re: [PATCH 6/6] drm/amdgpu: fix fence fallback timer expired error<br>
<br>
On 4/24/25 05:38, Zhang, GuoQing (Sam) wrote:<br>
> [AMD Official Use Only - AMD Internal Distribution Only]<br>
><br>
><br>
> Ping… @Koenig, Christian <<a href="mailto:Christian.Koenig@amd.com">mailto:Christian.Koenig@amd.com</a>><br>
><br>
> Thanks<br>
><br>
> Sam<br>
><br>
> *From: *amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of<br>
> Zhang, GuoQing (Sam) <GuoQing.Zhang@amd.com><br>
> *Date: *Wednesday, April 23, 2025 at 14:59<br>
> *To: *Christian König <ckoenig.leichtzumerken@gmail.com>, amd-<br>
> gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org><br>
> *Cc: *Zhao, Victor <Victor.Zhao@amd.com>, Chang, HaiJun<br>
> <HaiJun.Chang@amd.com>, Deng, Emily <Emily.Deng@amd.com>, Zhang,<br>
> Owen(SRDC) <Owen.Zhang2@amd.com><br>
> *Subject: *Re: [PATCH 6/6] drm/amdgpu: fix fence fallback timer<br>
> expired error<br>
><br>
> [AMD Official Use Only - AMD Internal Distribution Only]<br>
><br>
> [AMD Official Use Only - AMD Internal Distribution Only]<br>
><br>
> Hi @Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com">mailto:ckoenig.leichtzumerken@gmail.com</a>>,<br>
><br>
> On QEMU VM environment, when request_irq() is called in guest KMD,<br>
> QEMU will enable interrupt for the device on the host.<br>
><br>
> When hibernate and resume with a new vGPU without calling<br>
> request_irq() on the new vGPU, the interrupt of the new vGPU is not<br>
> enabled. The IH handler in guest KMD will not be called in this case.<br>
><br>
> This change is to ensure request_irq() is called on resume for the new vGPUs.<br>
<br>
That doesn't make sense.<br>
<br>
The MSI state is saved and restored by the core OS on suspend and resume, drivers should never mess with that.<br>
<br>
If this doesn't work with the new vGPU for some reason then that is not something we can work around inside the driver.<br>
<br>
Which state exactly isn't restored here?<br>
<br>
Regards,<br>
Christian.<br>
<br>
<br>
><br>
> Regards<br>
><br>
> Sam<br>
><br>
> *From: *Christian König <ckoenig.leichtzumerken@gmail.com><br>
> *Date: *Wednesday, April 16, 2025 at 21:54<br>
> *To: *Zhang, GuoQing (Sam) <GuoQing.Zhang@amd.com>, amd-<br>
> gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org><br>
> *Cc: *Zhao, Victor <Victor.Zhao@amd.com>, Chang, HaiJun<br>
> <HaiJun.Chang@amd.com>, Deng, Emily <Emily.Deng@amd.com><br>
> *Subject: *Re: [PATCH 6/6] drm/amdgpu: fix fence fallback timer<br>
> expired error<br>
><br>
> Am 14.04.25 um 12:46 schrieb Samuel Zhang:<br>
> > IH is not working after switching a new gpu index for the first time.<br>
> > IH handler function need to be re-registered with kernel after<br>
> switching > to new gpu index.<br>
><br>
> Why?<br>
><br>
> Christian.<br>
><br>
> ><br>
> > Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> > Change-Id:<br>
> Idece1c8fce24032fd08f5a8b6ac23793c51e56dd<br>
> > ---<br>
> > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 7 +++++-- ><br>
> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h | 1 + ><br>
> drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 18 ++++++++++++++++-- > 3<br>
> files changed, 22 insertions(+), 4 deletions(-) > > diff --git<br>
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/<br>
> amdgpu/amdgpu_irq.c > index 19ce4da285e8..2292245a0c5d 100644 > ---<br>
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c<br>
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c<br>
> > @@ -326,7 +326,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)<br>
> > return r;<br>
> > }<br>
> ><br>
> > -void amdgpu_irq_fini_hw(struct amdgpu_device *adev) > +void<br>
> amdgpu_irq_uninstall(struct amdgpu_device *adev) > {<br>
> > if (adev->irq.installed) {<br>
> > free_irq(adev->irq.irq, adev_to_drm(adev));<br>
> > @@ -334,7 +334,10 @@ void amdgpu_irq_fini_hw(struct amdgpu_device *adev)<br>
> > if (adev->irq.msi_enabled)<br>
> > pci_free_irq_vectors(adev->pdev);<br>
> > }<br>
> > -<br>
> > +}<br>
> > +void amdgpu_irq_fini_hw(struct amdgpu_device *adev) > +{<br>
> > + amdgpu_irq_uninstall(adev);<br>
> > amdgpu_ih_ring_fini(adev, &adev->irq.ih_soft);<br>
> > amdgpu_ih_ring_fini(adev, &adev->irq.ih);<br>
> > amdgpu_ih_ring_fini(adev, &adev->irq.ih1);<br>
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h<br>
> b/drivers/gpu/drm/amd/ amdgpu/amdgpu_irq.h > index<br>
> 04c0b4fa17a4..c6e6681b4f71 100644 > ---<br>
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h<br>
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h<br>
> > @@ -123,6 +123,7 @@ extern const int<br>
> node_id_to_phys_map[NODEID_MAX]; > void<br>
> amdgpu_irq_disable_all(struct amdgpu_device *adev); > > int<br>
> amdgpu_irq_init(struct amdgpu_device *adev); > +void<br>
> amdgpu_irq_uninstall(struct amdgpu_device *adev); > void<br>
> amdgpu_irq_fini_sw(struct amdgpu_device *adev); > void<br>
> amdgpu_irq_fini_hw(struct amdgpu_device *adev); > int<br>
> amdgpu_irq_add_id(struct amdgpu_device *adev, > diff --git<br>
> a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c b/drivers/gpu/drm/amd/<br>
> amdgpu/vega20_ih.c > index faa0dd75dd6d..ef996505e4dc 100644 > ---<br>
> a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c<br>
> > +++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c<br>
> > @@ -643,12 +643,26 @@ static int vega20_ih_hw_fini(struct<br>
> amdgpu_ip_block<br>
> *ip_block)<br>
> ><br>
> > static int vega20_ih_suspend(struct amdgpu_ip_block *ip_block) ><br>
> {<br>
> > - return vega20_ih_hw_fini(ip_block);<br>
> > + struct amdgpu_device *adev = ip_block->adev;<br>
> > + int r = 0;<br>
> > +<br>
> > + r = vega20_ih_hw_fini(ip_block);<br>
> > + amdgpu_irq_uninstall(adev);<br>
> > + return r;<br>
> > }<br>
> ><br>
> > static int vega20_ih_resume(struct amdgpu_ip_block *ip_block) ><br>
> {<br>
> > - return vega20_ih_hw_init(ip_block);<br>
> > + struct amdgpu_device *adev = ip_block->adev;<br>
> > + int r = 0;<br>
> > +<br>
> > + r = amdgpu_irq_init(adev);<br>
> > + if (r) {<br>
> > + dev_err(adev->dev, "amdgpu_irq_init failed in %s, %d\n",<br>
> __func__, r);<br>
> > + return r;<br>
> > + }<br>
> > + r = vega20_ih_hw_init(ip_block);<br>
> > + return r;<br>
> > }<br>
> ><br>
> > static bool vega20_ih_is_idle(struct amdgpu_ip_block *ip_block)<br>
></span><span style="font-size:12.0pt;font-family:"Times New Roman",serif"><o:p></o:p></span></p>
</div>
</div>
</div>
</div>
</div>
</body>
</html>