<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Aptos;}
@font-face
{font-family:"\@DengXian";
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
font-size:10.0pt;
font-family:"Courier New";}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;}
span.ui-provider
{mso-style-name:ui-provider;}
span.EmailStyle22
{mso-style-type:personal-reply;
font-family:"Arial",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<p style="font-family:Calibri;font-size:10pt;color:#0000FF;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left">
[AMD Official Use Only - AMD Internal Distribution Only]<br>
</p>
<br>
<div>
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Hi, Christian.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Share the process of the page fault issue in rocblas benchmark.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Find when there are multithreads read register “regIH_VMID_0_LUT” to get pasid,
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">This register will return error pasid value randomly, sometimes is 0, sometimes is 32768, (the real value is 32770).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">After check the invalid pasid, code will “continue” and not flush the gpu tlb.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">That’s why the page fault accours.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">After add the lock, the register not return invalid value, and the rocblas benchmark passed.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">You have submit a patch "implement TLB flush fence", in this patch you create a kernel thread to flush gpu tlb.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">And in main thread the function “svm_range_map_to_gpus” will call function “kfd_flush_tlb” and then flush gpu tlb as well.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Means that both the two threads will call function “gmc_v11_0_flush_gpu_tlb_pasid”.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">So after you merge your patch, the page fault issue accours.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">My first patch change flush gpu tlb to sync mode,
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">means the one thread flush the gpu tlb twice, so my first patch passed the rocblas benchmark.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">I already submit an email to firmware team to ask why the register will return wrong value.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">But if the firmware team not able to solve this issue, or need a long time to solve this issue,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">I will submit the patch like below to do the workaround.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><img width="740" height="649" style="width:7.7083in;height:6.7583in" id="_x0000_i1032" src="cid:image008.png@01DB2BBD.60C81420"></span><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Thanks,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Chong.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Li, Chong(Alan)
<br>
<b>Sent:</b> Friday, October 25, 2024 2:46 PM<br>
<b>To:</b> Koenig, Christian <Christian.Koenig@amd.com>; Andjelkovic, Dejan <Dejan.Andjelkovic@amd.com><br>
<b>Cc:</b> cao, lin <lin.cao@amd.com>; Yin, ZhenGuo (Chris) <ZhenGuo.Yin@amd.com>; Zhang, Tiantian (Celine) <Tiantian.Zhang@amd.com>; amd-gfx@lists.freedesktop.org<br>
<b>Subject:</b> RE: [PATCH] drm/amd/amdgpu: change the flush gpu tlb mode to sync mode.<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Hi, Christian.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">The size of log file so large, can’t paste in the Email.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">I copy the log file in directory “</span><a href="file://ark/incoming/chong/log"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">\\ark\incoming\chong\log</span></a><span style="font-size:11.0pt;font-family:"Arial",sans-serif">”,
the log file name is “kern.log”.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Can you access this directory ?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Thanks,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Chong.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Koenig, Christian <</span><a href="mailto:Christian.Koenig@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Christian.Koenig@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">>
<br>
<b>Sent:</b> Thursday, October 24, 2024 7:22 PM<br>
<b>To:</b> Li, Chong(Alan) <</span><a href="mailto:Chong.Li@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Chong.Li@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">>; Andjelkovic, Dejan <</span><a href="mailto:Dejan.Andjelkovic@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Dejan.Andjelkovic@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">><br>
<b>Cc:</b> cao, lin <</span><a href="mailto:lin.cao@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">lin.cao@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">>; Yin, ZhenGuo (Chris) <</span><a href="mailto:ZhenGuo.Yin@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">ZhenGuo.Yin@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">>;
Zhang, Tiantian (Celine) <</span><a href="mailto:Tiantian.Zhang@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Tiantian.Zhang@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">>; Raina, Yera <</span><a href="mailto:Yera.Raina@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Yera.Raina@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">><br>
<b>Subject:</b> Re: [PATCH] drm/amd/amdgpu: change the flush gpu tlb mode to sync mode.<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal" style="margin-bottom:12.0pt">Do you have the full log as text file? As image it's pretty much useless.<br>
<br>
Regards,<br>
Christian.<o:p></o:p></p>
<div>
<p class="MsoNormal">Am 24.10.24 um 09:41 schrieb Li, Chong(Alan):<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p style="margin:5.0pt"><span style="font-size:10.0pt;font-family:"Calibri",sans-serif;color:blue">[AMD Official Use Only - AMD Internal Distribution Only]<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Hi,
</span>Christian.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">We can see the dmesg log,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">After address “7ef90be00” already update the ptes, page fault still happen.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><img border="0" width="714" height="260" style="width:7.4333in;height:2.7083in" id="Picture_x0020_7" src="cid:image001.png@01DB2BAC.E798ED50"></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><img border="0" width="789" height="207" style="width:8.2166in;height:2.1583in" id="Picture_x0020_6" src="cid:image002.png@01DB2BAC.E798ED50"></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Thanks,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Chong.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Koenig, Christian
</span><a href="mailto:Christian.Koenig@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Christian.Koenig@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
<br>
<b>Sent:</b> Wednesday, October 23, 2024 5:26 PM<br>
<b>To:</b> Li, Chong(Alan) </span><a href="mailto:Chong.Li@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Chong.Li@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">; Andjelkovic, Dejan
</span><a href="mailto:Dejan.Andjelkovic@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Dejan.Andjelkovic@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Cc:</b> cao, lin </span><a href="mailto:lin.cao@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><lin.cao@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">; Yin, ZhenGuo (Chris)
</span><a href="mailto:ZhenGuo.Yin@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><ZhenGuo.Yin@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">; Zhang, Tiantian (Celine)
</span><a href="mailto:Tiantian.Zhang@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Tiantian.Zhang@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">; Raina, Yera
</span><a href="mailto:Yera.Raina@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Yera.Raina@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Subject:</b> Re: [PATCH] drm/amd/amdgpu: change the flush gpu tlb mode to sync mode.</span><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal" style="margin-bottom:12.0pt">Hi Chong,<br>
<br>
oh that could indeed be.<br>
<br>
I suggest to add a trace point for the page fault so that we can guarantee that we use the same time basis for both events.<br>
<br>
That should make it trivial to compare them.<br>
<br>
Regards,<br>
Christian.<o:p></o:p></p>
<div>
<p class="MsoNormal">Am 23.10.24 um 10:17 schrieb Li, Chong(Alan):<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p style="margin:5.0pt"><span style="font-size:10.0pt;font-family:"Calibri",sans-serif;color:blue">[AMD Official Use Only - AMD Internal Distribution Only]</span><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Hi, Christian.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><b><span style="font-family:"Arial",sans-serif">I add a log in kernel, and prove the timestamp in tracing log is slower than dmesg log,
</span></b><o:p></o:p></p>
<p class="MsoNormal"><b><span style="font-family:"Arial",sans-serif">so we can’t give a conclusion that the issue in rocm.</span></b><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">------------------------ the information I sync with Andjelkovic, Dejan ----------------------------------------</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">dmesg shows that the page fault happens address “0x000072e5f4401000” at time “6587.772178”,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><img border="0" width="1318" height="119" style="width:13.7333in;height:1.2416in" id="_x0000_i1029" src="cid:image003.png@01DB2BAC.E798ED50"></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">tracing log shows that the function “amdgpu_vm_update_ptes” be called at time “6587.790869”,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><img border="0" width="1167" height="24" style="width:12.1583in;height:.25in" id="Picture_x0020_4" src="cid:image004.png@01DB2BAC.E798ED50"></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">------------------------ the information I sync with Andjelkovic, Dejan ----------------------------------------</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">From the log time stamp, you give a conclusion that “</span><span class="ui-provider">The test tries to access memory before it is probably mapped and that is provable by looking
into the tracelogs.</span><span style="font-size:11.0pt;font-family:"Arial",sans-serif">”.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">But after I review the code, the function “amdgpu_vm_ptes_update” be called in function “svm_range_set_attr”,
</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">So, after this log in above dmesg print “[ 6587.772136] amdgpu: pasid 0x8002 svms 0x000000008b03ff39 [0x72e5f4400 0x72e5fc3ff] done, r=0”,
</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">the function “svm_range_set_attr” will leave, in that time “amdgpu_vm_ptes_update” is already be called, the timestamp is not reasonable.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">I think maybe the timestamp in tracing log has some delay, and I add a line of log in kernel to verify my guess,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><img border="0" width="792" height="98" style="width:8.25in;height:1.0166in" id="Picture_x0020_3" src="cid:image005.png@01DB2BAC.E798ED50"></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">The below is the result:</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">tracing log shows the address “ffffffc00” at time “227.298607”,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">dmesg log print the address “ffffffc00” at time “226.756137”.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">traing log:</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><img border="0" width="1336" height="143" style="width:13.9166in;height:1.4916in" id="Picture_x0020_2" src="cid:image006.png@01DB2BAC.E798ED50"></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">dmesg log:</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><img border="0" width="957" height="112" style="width:9.9666in;height:1.1666in" id="Picture_x0020_1" src="cid:image007.png@01DB2BAC.E798ED50"></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Thanks,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Chong.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Li, Chong(Alan)
<br>
<b>Sent:</b> Monday, October 21, 2024 6:38 PM<br>
<b>To:</b> Koenig, Christian </span><a href="mailto:Christian.Koenig@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Christian.Koenig@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">; Raina, Yera
</span><a href="mailto:Yera.Raina@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Yera.Raina@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">; Andjelkovic, Dejan
</span><a href="mailto:Dejan.Andjelkovic@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Dejan.Andjelkovic@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Cc:</b> cao, lin </span><a href="mailto:lin.cao@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><lin.cao@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">; Yin, ZhenGuo (Chris)
</span><a href="mailto:ZhenGuo.Yin@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><ZhenGuo.Yin@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">; Zhang, Tiantian (Celine)
</span><a href="mailto:Tiantian.Zhang@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Tiantian.Zhang@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Subject:</b> RE: [PATCH] drm/amd/amdgpu: change the flush gpu tlb mode to sync mode.</span><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Hi,
</span>Christian.<o:p></o:p></p>
<p class="MsoNormal">Thanks for your reply,<o:p></o:p></p>
<p class="MsoNormal">And do you have any advice <span style="font-family:DengXian">
about</span> this issue?<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Hi, Raina, Year.<br>
Share I assign this ticket </span><a href="https://ontrack-internal.amd.com/browse/SWDEV-459983"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">SWDEV-459983</span></a><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> to rocm team?</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Thanks,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Chong.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Koenig, Christian <</span><a href="mailto:Christian.Koenig@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Christian.Koenig@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">>
<br>
<b>Sent:</b> Monday, October 21, 2024 6:08 PM<br>
<b>To:</b> Li, Chong(Alan) <</span><a href="mailto:Chong.Li@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Chong.Li@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">>; Raina, Yera <</span><a href="mailto:Yera.Raina@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Yera.Raina@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">><br>
<b>Cc:</b> cao, lin <</span><a href="mailto:lin.cao@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">lin.cao@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">>;
</span><a href="mailto:amd-gfx@lists.freedesktop.org"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">amd-gfx@lists.freedesktop.org</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Subject:</b> Re: [PATCH] drm/amd/amdgpu: change the flush gpu tlb mode to sync mode.</span><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal" style="margin-bottom:12.0pt">Hi Chong,<br>
<br>
Andjelkovic just shared a bunch of traces from rocm on teams with me which I analyzed.<br>
<br>
When you know what you look for it's actually pretty obvious what's going on. Just look at the timestamp of the fault and compare that with the timestamp of the operation mapping something at the given address.<br>
<br>
When mapping an address happens only after accessing an address then there is clearly something wrong in the code which coordinates this and that is the ROCm stress test tool in this case.<br>
<br>
Regards,<br>
Christian.<o:p></o:p></p>
<div>
<p class="MsoNormal">Am 21.10.24 um 11:02 schrieb Li, Chong(Alan):<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p style="margin:5.0pt"><span style="font-size:10.0pt;font-family:"Calibri",sans-serif;color:blue">[AMD Official Use Only - AMD Internal Distribution Only]</span><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Hi, Christian,
</span> <span style="font-size:11.0pt;font-family:"Arial",sans-serif">Raina, Yera.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">If this issue in rocm, I need assign my ticket
</span><a href="https://ontrack-internal.amd.com/browse/SWDEV-459983"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">SWDEV-459983</span></a><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> to rocm team.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Is there anything to share with the rocm pm?</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Such as the Email or chat history or the ticket you talk with
</span><span class="ui-provider">Andjelkovic.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Thanks,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Chong.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Koenig, Christian
</span><a href="mailto:Christian.Koenig@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Christian.Koenig@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
<br>
<b>Sent:</b> Monday, October 21, 2024 4:00 PM<br>
<b>To:</b> Li, Chong(Alan) </span><a href="mailto:Chong.Li@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Chong.Li@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">;
</span><a href="mailto:amd-gfx@lists.freedesktop.org"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">amd-gfx@lists.freedesktop.org</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Cc:</b> cao, lin </span><a href="mailto:lin.cao@amd.com"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><lin.cao@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Subject:</b> Re: [PATCH] drm/amd/amdgpu: change the flush gpu tlb mode to sync mode.</span><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal" style="margin-bottom:12.0pt">Am 21.10.24 um 07:56 schrieb Chong Li:<br>
<br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>change the gpu tlb flush mode to sync mode to<o:p></o:p></pre>
<pre>solve the issue in the rocm stress test.<o:p></o:p></pre>
</blockquote>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
And again complete NAK to this.<br>
<br>
I've already proven together with <span class="ui-provider">Andjelkovic that the problem is that the rocm stress test is broken.</span><br>
<br>
<span class="ui-provider">The test tries to access memory before it is probably mapped and that is provable by looking into the tracelogs.</span><br>
<br>
<span class="ui-provider">Regards,</span><br>
<span class="ui-provider">Christian. </span><br>
<br>
<br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre> <o:p></o:p></pre>
<pre> <o:p></o:p></pre>
<pre>Signed-off-by: Chong Li <a href="mailto:chongli2@amd.com"><chongli2@amd.com></a><o:p></o:p></pre>
<pre>---<o:p></o:p></pre>
<pre> drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c | 4 ++--<o:p></o:p></pre>
<pre> 1 file changed, 2 insertions(+), 2 deletions(-)<o:p></o:p></pre>
<pre> <o:p></o:p></pre>
<pre>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c<o:p></o:p></pre>
<pre>index 51cddfa3f1e8..4d9ff7b31618 100644<o:p></o:p></pre>
<pre>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c<o:p></o:p></pre>
<pre>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c<o:p></o:p></pre>
<pre>@@ -98,7 +98,6 @@ void amdgpu_vm_tlb_fence_create(struct amdgpu_device *adev, struct amdgpu_vm *vm<o:p></o:p></pre>
<pre> f->adev = adev;<o:p></o:p></pre>
<pre> f->dependency = *fence;<o:p></o:p></pre>
<pre> f->pasid = vm->pasid;<o:p></o:p></pre>
<pre>- INIT_WORK(&f->work, amdgpu_tlb_fence_work);<o:p></o:p></pre>
<pre> spin_lock_init(&f->lock);<o:p></o:p></pre>
<pre> <o:p></o:p></pre>
<pre> dma_fence_init(&f->base, &amdgpu_tlb_fence_ops, &f->lock,<o:p></o:p></pre>
<pre>@@ -106,7 +105,8 @@ void amdgpu_vm_tlb_fence_create(struct amdgpu_device *adev, struct amdgpu_vm *vm<o:p></o:p></pre>
<pre> <o:p></o:p></pre>
<pre> /* TODO: We probably need a separate wq here */<o:p></o:p></pre>
<pre> dma_fence_get(&f->base);<o:p></o:p></pre>
<pre>- schedule_work(&f->work);<o:p></o:p></pre>
<pre> <o:p></o:p></pre>
<pre> *fence = &f->base;<o:p></o:p></pre>
<pre>+<o:p></o:p></pre>
<pre>+ amdgpu_tlb_fence_work(&f->work);<o:p></o:p></pre>
<pre> }<o:p></o:p></pre>
</blockquote>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</blockquote>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</blockquote>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</blockquote>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</body>
</html>