<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-2022-jp">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@DengXian";
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:"Microsoft YaHei";
panose-1:2 11 5 3 2 2 4 2 2 4;}
@font-face
{font-family:"\@Microsoft YaHei";}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:14.0pt;
font-family:"Calibri",sans-serif;}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:"Calibri",sans-serif;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:758333370;
mso-list-type:hybrid;
mso-list-template-ids:-1255109936 67698703 67698689 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:39.0pt;
text-indent:-.25in;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:75.0pt;
text-indent:-.25in;
font-family:Symbol;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:111.0pt;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:147.0pt;
text-indent:-.25in;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:183.0pt;
text-indent:-.25in;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:219.0pt;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:255.0pt;
text-indent:-.25in;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:291.0pt;
text-indent:-.25in;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:327.0pt;
text-indent:-9.0pt;}
@list l1
{mso-list-id:1100415748;
mso-list-type:hybrid;
mso-list-template-ids:-1587907358 67698689 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l1:level1
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:75.0pt;
text-indent:-.25in;
font-family:Symbol;}
@list l1:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:111.0pt;
text-indent:-.25in;
font-family:"Courier New";}
@list l1:level3
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:147.0pt;
text-indent:-.25in;
font-family:Wingdings;}
@list l1:level4
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:183.0pt;
text-indent:-.25in;
font-family:Symbol;}
@list l1:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:219.0pt;
text-indent:-.25in;
font-family:"Courier New";}
@list l1:level6
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:255.0pt;
text-indent:-.25in;
font-family:Wingdings;}
@list l1:level7
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:291.0pt;
text-indent:-.25in;
font-family:Symbol;}
@list l1:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:327.0pt;
text-indent:-.25in;
font-family:"Courier New";}
@list l1:level9
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:363.0pt;
text-indent:-.25in;
font-family:Wingdings;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoPlainText">Agree with your thoughts that we drop amdgpu_ras_enable=2 condition. The only concern in my side is that besides fatal_error, another result may happen that atombios_init timeout on xgmi by baco (not sure psp mode1 reset causes this
as well). <o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Assuming no amdgpu_ras_enable=2 check, if PMFW > 40.52, the use cases as my understanding includes:
<o:p></o:p></p>
<ol style="margin-top:0in" start="1" type="1">
<li class="MsoPlainText" style="margin-left:3.0pt;mso-list:l0 level1 lfo1">sGPU without RAS:<o:p></o:p></li><ul style="margin-top:0in" type="disc">
<li class="MsoPlainText" style="margin-left:3.0pt;mso-list:l0 level2 lfo1">new: baco<o:p></o:p></li><li class="MsoPlainText" style="margin-left:3.0pt;mso-list:l0 level2 lfo1">old: baco<o:p></o:p></li></ul>
<li class="MsoPlainText" style="margin-left:3.0pt;mso-list:l0 level1 lfo1">sGPU with RAS:<o:p></o:p></li></ol>
<ul style="margin-top:0in" type="disc">
<li class="MsoPlainText" style="margin-left:39.0pt;mso-list:l1 level1 lfo2">new: baco<o:p></o:p></li><li class="MsoPlainText" style="margin-left:39.0pt;mso-list:l1 level1 lfo2">old: psp mode1 chain reset and legacy fatal_error handling<o:p></o:p></li></ul>
<ol style="margin-top:0in" start="3" type="1">
<li class="MsoPlainText" style="margin-left:3.0pt;mso-list:l0 level1 lfo1">XGMI with RAS: baco<o:p></o:p></li><ul style="margin-top:0in" type="disc">
<li class="MsoPlainText" style="margin-left:3.0pt;mso-list:l0 level2 lfo1">new: baco<o:p></o:p></li><li class="MsoPlainText" style="margin-left:3.0pt;mso-list:l0 level2 lfo1">old: psp mode1 chain reset and legacy fatal_error handling<o:p></o:p></li></ul>
<li class="MsoPlainText" style="margin-left:3.0pt;mso-list:l0 level1 lfo1">XGMI without RAS: baco<o:p></o:p></li><ul style="margin-top:0in" type="disc">
<li class="MsoPlainText" style="margin-left:3.0pt;mso-list:l0 level2 lfo1">new: baco<o:p></o:p></li><li class="MsoPlainText" style="margin-left:3.0pt;mso-list:l0 level2 lfo1">old: psp mode1 chain reset<o:p></o:p></li></ul>
</ol>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">That is to say, all uses cases go on baco road when PMFW > 40.52.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Regards,<o:p></o:p></p>
<p class="MsoPlainText">Ma Le<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">-----Original Message-----<br>
From: Zhang, Hawking <Hawking.Zhang@amd.com> <br>
Sent: Wednesday, November 27, 2019 7:28 PM<br>
To: Ma, Le <Le.Ma@amd.com>; amd-gfx@lists.freedesktop.org<br>
Cc: Chen, Guchun <Guchun.Chen@amd.com>; Zhou1, Tao <Tao.Zhou1@amd.com>; Li, Dennis <Dennis.Li@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Ma, Le <Le.Ma@amd.com><br>
Subject: RE: [PATCH 06/10] drm/amdgpu: add condition to enable baco for xgmi/ras case</p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">[AMD Public Use]<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">After thinking it a bit, I think we can just rely on PMFW version to decide to go RAS recovery or legacy fatal_error handling for the platforms that support RAS. Leveraging amdgpu_ras_enable as a temporary solution seems not necessary?
Even baco ras recovery not stable, it is the same result as legacy fatal_error handling that user has to reboot the node manually.
<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">So the new soc reset use cases are:<o:p></o:p></p>
<p class="MsoPlainText">XGMI (without RAS): use PSP mode1 based chain reset, RAS enabled (with PMFW 40.52 and onwards): use BACO based RAS recovery, RAS enabled (with PMFW prior to 40.52): use legacy fatal_error handling.<o:p></o:p></p>
<p class="MsoPlainText"><o:p></o:p></p>
<p class="MsoPlainText">Anything else?<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Regards,<o:p></o:p></p>
<p class="MsoPlainText">Hawking<o:p></o:p></p>
<p class="MsoPlainText">-----Original Message-----<o:p></o:p></p>
<p class="MsoPlainText">From: Le Ma <<a href="mailto:le.ma@amd.com"><span style="color:windowtext;text-decoration:none">le.ma@amd.com</span></a>><o:p></o:p></p>
<p class="MsoPlainText">Sent: 2019<span lang="ZH-CN" style="font-family:"Microsoft YaHei",sans-serif">年</span>11<span lang="ZH-CN" style="font-family:"Microsoft YaHei",sans-serif">月</span>27<span lang="ZH-CN" style="font-family:"Microsoft YaHei",sans-serif">日</span>
17:15<o:p></o:p></p>
<p class="MsoPlainText">To: <a href="mailto:amd-gfx@lists.freedesktop.org"><span style="color:windowtext;text-decoration:none">amd-gfx@lists.freedesktop.org</span></a><o:p></o:p></p>
<p class="MsoPlainText">Cc: Zhang, Hawking <<a href="mailto:Hawking.Zhang@amd.com"><span style="color:windowtext;text-decoration:none">Hawking.Zhang@amd.com</span></a>>; Chen, Guchun <<a href="mailto:Guchun.Chen@amd.com"><span style="color:windowtext;text-decoration:none">Guchun.Chen@amd.com</span></a>>;
Zhou1, Tao <<a href="mailto:Tao.Zhou1@amd.com"><span style="color:windowtext;text-decoration:none">Tao.Zhou1@amd.com</span></a>>; Li, Dennis <<a href="mailto:Dennis.Li@amd.com"><span style="color:windowtext;text-decoration:none">Dennis.Li@amd.com</span></a>>;
Deucher, Alexander <<a href="mailto:Alexander.Deucher@amd.com"><span style="color:windowtext;text-decoration:none">Alexander.Deucher@amd.com</span></a>>; Ma, Le <<a href="mailto:Le.Ma@amd.com"><span style="color:windowtext;text-decoration:none">Le.Ma@amd.com</span></a>><o:p></o:p></p>
<p class="MsoPlainText">Subject: [PATCH 06/10] drm/amdgpu: add condition to enable baco for xgmi/ras case<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Avoid to change default reset behavior for production card by checking amdgpu_ras_enable equal to 2. And only new enough smu ucode can support baco for xgmi/ras case.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Change-Id: I07c3e6862be03e068745c73db8ea71f428ecba6b<o:p></o:p></p>
<p class="MsoPlainText">Signed-off-by: Le Ma <<a href="mailto:le.ma@amd.com"><span style="color:windowtext;text-decoration:none">le.ma@amd.com</span></a>><o:p></o:p></p>
<p class="MsoPlainText">---<o:p></o:p></p>
<p class="MsoPlainText">drivers/gpu/drm/amd/amdgpu/soc15.c | 4 +++-<o:p></o:p></p>
<p class="MsoPlainText">1 file changed, 3 insertions(+), 1 deletion(-)<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c<o:p></o:p></p>
<p class="MsoPlainText">index 951327f..6202333 100644<o:p></o:p></p>
<p class="MsoPlainText">--- a/drivers/gpu/drm/amd/amdgpu/soc15.c<o:p></o:p></p>
<p class="MsoPlainText">+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c<o:p></o:p></p>
<p class="MsoPlainText">@@ -577,7 +577,9 @@ soc15_asic_reset_method(struct amdgpu_device *adev)<o:p></o:p></p>
<p class="MsoPlainText"> struct amdgpu_hive_info *hive = amdgpu_get_xgmi_hive(adev, 0);<o:p></o:p></p>
<p class="MsoPlainText"> struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);<o:p></o:p></p>
<p class="MsoPlainText"><o:p></o:p></p>
<p class="MsoPlainText">- if (hive || (ras && ras->supported))<o:p></o:p></p>
<p class="MsoPlainText">+ if ((hive || (ras && ras->supported)) &&<o:p></o:p></p>
<p class="MsoPlainText">+ (amdgpu_ras_enable != 2 ||<o:p></o:p></p>
<p class="MsoPlainText">+ adev->pm.fw_version <= 0x283400))<o:p></o:p></p>
<p class="MsoPlainText"> baco_reset = false;<o:p></o:p></p>
<p class="MsoPlainText"> }<o:p></o:p></p>
<p class="MsoPlainText"> break;<o:p></o:p></p>
<p class="MsoPlainText">--<o:p></o:p></p>
<p class="MsoPlainText">2.7.4<o:p></o:p></p>
</div>
</body>
</html>