<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@DengXian";
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
p.msipheadera92e061b, li.msipheadera92e061b, div.msipheadera92e061b
{mso-style-name:msipheadera92e061b;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
p.msipheader87abd423, li.msipheader87abd423, div.msipheader87abd423
{mso-style-name:msipheader87abd423;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="msipheader87abd423" style="margin:0in;margin-bottom:.0001pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#317100">[AMD Public Use]</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We’d better keep original comment “/* skip error address process if -ENOMEM */”, if err_addr is not allocated successfully.<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Regards,<o:p></o:p></p>
<p class="MsoNormal">Guchun<o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Zhang, Hawking <Hawking.Zhang@amd.com> <br>
<b>Sent:</b> Thursday, March 5, 2020 7:23 PM<br>
<b>To:</b> Clements, John <John.Clements@amd.com>; amd-gfx@lists.freedesktop.org; Li, Dennis <Dennis.Li@amd.com>; Zhou1, Tao <Tao.Zhou1@amd.com>; Chen, Guchun <Guchun.Chen@amd.com><br>
<b>Subject:</b> RE: [PATCH] drm/amdgpu: update page retirement sequence<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="msipheadera92e061b" style="margin:0in;margin-bottom:.0001pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#0078D7">[AMD Official Use Only - Internal Distribution Only]</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I see. So it’s the following programming that is in risk to corrupt data for other instances.
<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> /* clear umc status */<o:p></o:p></p>
<p class="MsoNormal"> WREG64_PCIE((mc_umc_status_addr + umc_reg_offset) * 4, 0x0ULL);<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">For error injection, everytime it should just have one instance had the error status record. Therefore, it make sense to me that we only clear the status register once. As discussed, we shall also follow up with umc team on the potential
issue with index mode programming.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Please also add some comments in code for this unexpected behavior that we shall follow up. Other than that, the patch is
<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Reviewed-by: Hawking Zhang <<a href="mailto:Hawking.Zhang@amd.com">Hawking.Zhang@amd.com</a>><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Regards,<br>
Hawking<o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Clements, John <<a href="mailto:John.Clements@amd.com">John.Clements@amd.com</a>>
<br>
<b>Sent:</b> Thursday, March 5, 2020 18:19<br>
<b>To:</b> Zhang, Hawking <<a href="mailto:Hawking.Zhang@amd.com">Hawking.Zhang@amd.com</a>>;
<a href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>; Li, Dennis <<a href="mailto:Dennis.Li@amd.com">Dennis.Li@amd.com</a>>; Zhou1, Tao <<a href="mailto:Tao.Zhou1@amd.com">Tao.Zhou1@amd.com</a>>; Chen, Guchun <<a href="mailto:Guchun.Chen@amd.com">Guchun.Chen@amd.com</a>><br>
<b>Subject:</b> RE: [PATCH] drm/amdgpu: update page retirement sequence<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="msipheadera92e061b" style="margin:0in;margin-bottom:.0001pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#0078D7">[AMD Official Use Only - Internal Distribution Only]</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">In the original sequence, if the key bits are not set in the mca_status, the page retirement will not happen and the status register will be cleared.<o:p></o:p></p>
<p class="MsoNormal">If there is a UMC UE, that register will be cleared erroneously 31 times.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">If MCA Status == 0 already from the beginning there is no reason to press forward with the rest of the checks and clear the register.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Zhang, Hawking <<a href="mailto:Hawking.Zhang@amd.com">Hawking.Zhang@amd.com</a>>
<br>
<b>Sent:</b> Thursday, March 5, 2020 5:56 PM<br>
<b>To:</b> Clements, John <<a href="mailto:John.Clements@amd.com">John.Clements@amd.com</a>>;
<a href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>; Li, Dennis <<a href="mailto:Dennis.Li@amd.com">Dennis.Li@amd.com</a>>; Zhou1, Tao <<a href="mailto:Tao.Zhou1@amd.com">Tao.Zhou1@amd.com</a>>; Chen, Guchun <<a href="mailto:Guchun.Chen@amd.com">Guchun.Chen@amd.com</a>><br>
<b>Subject:</b> RE: [PATCH] drm/amdgpu: update page retirement sequence<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="msipheadera92e061b" style="margin:0in;margin-bottom:.0001pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#0078D7">[AMD Official Use Only - Internal Distribution Only]</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hi John,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Can you please explain more on the differences between (a). exit immediately when mca_status is 0 and (b). exit when some of critical field in mca_status is 0?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Regards,<br>
Hawking<o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Clements, John <<a href="mailto:John.Clements@amd.com">John.Clements@amd.com</a>>
<br>
<b>Sent:</b> Thursday, March 5, 2020 17:40<br>
<b>To:</b> <a href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>; Zhang, Hawking <<a href="mailto:Hawking.Zhang@amd.com">Hawking.Zhang@amd.com</a>>; Li, Dennis <<a href="mailto:Dennis.Li@amd.com">Dennis.Li@amd.com</a>>; Zhou1, Tao
<<a href="mailto:Tao.Zhou1@amd.com">Tao.Zhou1@amd.com</a>>; Chen, Guchun <<a href="mailto:Guchun.Chen@amd.com">Guchun.Chen@amd.com</a>><br>
<b>Subject:</b> [PATCH] drm/amdgpu: update page retirement sequence<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="msipheadera92e061b" style="margin:0in;margin-bottom:.0001pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#0078D7">[AMD Official Use Only - Internal Distribution Only]</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">check UMC status and exit prior to making and erroneus register access<o:p></o:p></p>
</div>
</body>
</html>