<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
Hi Chong,<br>
<br>
it could be that the mailer mangled the patch and because of that it
looks like the coding style isn't correct.<br>
<br>
Please send it to the mailing list and CC me using the "git
send-email" command.<br>
<br>
Regards,<br>
Christian.<br>
<br>
<div class="moz-cite-prefix">Am 04.11.24 um 11:54 schrieb Li,
Chong(Alan):<br>
</div>
<blockquote type="cite" cite="mid:DS7PR12MB5768B6EAF2AE1FFC70E4EF3F9B512@DS7PR12MB5768.namprd12.prod.outlook.com">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]-->
<style>@font-face
{font-family:SimSun;
panose-1:2 1 6 0 3 1 1 1 1 1;}@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face
{font-family:Aptos;}@font-face
{font-family:"\@DengXian";
panose-1:2 1 6 0 3 1 1 1 1 1;}@font-face
{font-family:"\@SimSun";
panose-1:2 1 6 0 3 1 1 1 1 1;}@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0in;
font-size:11.0pt;
font-family:"Arial",sans-serif;
mso-ligatures:standardcontextual;}pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
font-size:10.0pt;
font-family:"Courier New";}span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;}span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:"Arial",sans-serif;
mso-ligatures:standardcontextual;}span.ui-provider
{mso-style-name:ui-provider;}span.EmailStyle25
{mso-style-type:personal-reply;
font-family:"Arial",sans-serif;
color:windowtext;}.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}div.WordSection1
{page:WordSection1;}</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<p style="font-family:Calibri;font-size:10pt;color:#0000FF;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left">
[AMD Official Use Only - AMD Internal Distribution Only]<br>
</p>
<br>
<div>
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Hi,
Christian.<br>
<br>
the max wb number is 1024, should be enough.<br>
<br>
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">#define
AMDGPU_MAX_WB 1024<br>
<br>
<br>
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">And I
check the patch with checkpatch.pl again, both the coding
style errors and warnings is 0.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><br>
<img style="width:9.225in;height:2.7416in" id="_x0000_i1033" src="cid:part1.hdeEv0Td.gwoGgUfI@amd.com" class="" width="886" height="263"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Thanks,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Chong.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
Koenig, Christian <a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a>
<br>
<b>Sent:</b> Monday, November 4, 2024 6:22 PM<br>
<b>To:</b> Li, Chong(Alan) <a class="moz-txt-link-rfc2396E" href="mailto:Chong.Li@amd.com"><Chong.Li@amd.com></a><br>
<b>Cc:</b> cao, lin <a class="moz-txt-link-rfc2396E" href="mailto:lin.cao@amd.com"><lin.cao@amd.com></a>; Yin,
ZhenGuo (Chris) <a class="moz-txt-link-rfc2396E" href="mailto:ZhenGuo.Yin@amd.com"><ZhenGuo.Yin@amd.com></a>; Deng,
Emily <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>; Zhang, Tiantian
(Celine) <a class="moz-txt-link-rfc2396E" href="mailto:Tiantian.Zhang@amd.com"><Tiantian.Zhang@amd.com></a>; Andjelkovic,
Dejan <a class="moz-txt-link-rfc2396E" href="mailto:Dejan.Andjelkovic@amd.com"><Dejan.Andjelkovic@amd.com></a>;
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
<b>Subject:</b> Re: [PATCH] drm/amdgpu: fix return
ramdom value when multiple threads read registers via
mes.<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Am 04.11.24 um 07:43 schrieb Li,
Chong(Alan):<br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p style="margin:5.0pt"><span style="font-size:10.0pt;font-family:"Calibri",sans-serif;color:blue">[AMD
Official Use Only - AMD Internal Distribution Only]<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoPlainText">The currect code use the address
"adev->mes.read_val_ptr" to store the value read from
register via mes.<o:p></o:p></p>
<p class="MsoPlainText">So when multiple threads read
register,<o:p></o:p></p>
<p class="MsoPlainText">multiple threads have to share the
one address, and overwrite the value each other.<o:p></o:p></p>
</div>
</blockquote>
<p class="MsoNormal"><br>
Good catch.<br>
<br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">Assign an address by
"amdgpu_device_wb_get" to store register value.<o:p></o:p></p>
<p class="MsoPlainText">each thread will has an address to
store register value.<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">Signed-off-by: Chong Li <<a href="mailto:chongli2@amd.com" moz-do-not-send="true" class="moz-txt-link-freetext">chongli2@amd.com</a>><o:p></o:p></p>
<p class="MsoPlainText">---<o:p></o:p></p>
<p class="MsoPlainText">drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
| 30 +++++++++++--------------
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 3 ---<o:p></o:p></p>
<p class="MsoPlainText">2 files changed, 13 insertions(+),
20 deletions(-)<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c<o:p></o:p></p>
<p class="MsoPlainText">index 83d0f731fb65..d74e3507e155
100644<o:p></o:p></p>
<p class="MsoPlainText">---
a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c<o:p></o:p></p>
<p class="MsoPlainText">+++
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c<o:p></o:p></p>
<p class="MsoPlainText">@@ -189,17 +189,6 @@ int
amdgpu_mes_init(struct amdgpu_device *adev)<o:p></o:p></p>
<p class="MsoPlainText">
(uint64_t
*)&adev->wb.wb[adev->mes.query_status_fence_offs[i]];<o:p></o:p></p>
<p class="MsoPlainText"> }<o:p></o:p></p>
<p class="MsoPlainText">- r =
amdgpu_device_wb_get(adev,
&adev->mes.read_val_offs);<o:p></o:p></p>
<p class="MsoPlainText">- if (r) {<o:p></o:p></p>
<p class="MsoPlainText">-
dev_err(adev->dev,<o:p></o:p></p>
<p class="MsoPlainText">-
"(%d) read_val_offs alloc failed\n", r);<o:p></o:p></p>
<p class="MsoPlainText">- goto
error;<o:p></o:p></p>
<p class="MsoPlainText">- }<o:p></o:p></p>
<p class="MsoPlainText">-
adev->mes.read_val_gpu_addr =<o:p></o:p></p>
<p class="MsoPlainText">-
adev->wb.gpu_addr + (adev->mes.read_val_offs * 4);<o:p></o:p></p>
<p class="MsoPlainText">-
adev->mes.read_val_ptr =<o:p></o:p></p>
<p class="MsoPlainText">- (uint32_t
*)&adev->wb.wb[adev->mes.read_val_offs];<o:p></o:p></p>
<p class="MsoPlainText">-<o:p></o:p></p>
<p class="MsoPlainText"> r =
amdgpu_mes_doorbell_init(adev);<o:p></o:p></p>
<p class="MsoPlainText"> if (r)<o:p></o:p></p>
<p class="MsoPlainText"> goto error;<o:p></o:p></p>
<p class="MsoPlainText">@@ -220,8 +209,6 @@ int
amdgpu_mes_init(struct amdgpu_device *adev)<o:p></o:p></p>
<p class="MsoPlainText">
amdgpu_device_wb_free(adev,<o:p></o:p></p>
<p class="MsoPlainText">
adev->mes.query_status_fence_offs[i]);<o:p></o:p></p>
<p class="MsoPlainText"> }<o:p></o:p></p>
<p class="MsoPlainText">- if
(adev->mes.read_val_ptr)<o:p></o:p></p>
<p class="MsoPlainText">-
amdgpu_device_wb_free(adev, adev->mes.read_val_offs);<o:p></o:p></p>
<p class="MsoPlainText">
idr_destroy(&adev->mes.pasid_idr);<o:p></o:p></p>
<p class="MsoPlainText">
idr_destroy(&adev->mes.gang_id_idr);<o:p></o:p></p>
<p class="MsoPlainText">@@ -246,8 +233,6 @@ void
amdgpu_mes_fini(struct amdgpu_device *adev)<o:p></o:p></p>
<p class="MsoPlainText">
amdgpu_device_wb_free(adev,<o:p></o:p></p>
<p class="MsoPlainText">
adev->mes.query_status_fence_offs[i]);<o:p></o:p></p>
<p class="MsoPlainText"> }<o:p></o:p></p>
<p class="MsoPlainText">- if
(adev->mes.read_val_ptr)<o:p></o:p></p>
<p class="MsoPlainText">-
amdgpu_device_wb_free(adev, adev->mes.read_val_offs);<o:p></o:p></p>
<p class="MsoPlainText">
amdgpu_mes_doorbell_free(adev);<o:p></o:p></p>
<p class="MsoPlainText">@@ -918,10 +903,19 @@ uint32_t
amdgpu_mes_rreg(struct amdgpu_device *adev, uint32_t
reg) {<o:p></o:p></p>
<p class="MsoPlainText"> struct
mes_misc_op_input op_input;<o:p></o:p></p>
<p class="MsoPlainText"> int r, val = 0;<o:p></o:p></p>
<p class="MsoPlainText">+ uint32_t addr_offset =
0;<o:p></o:p></p>
<p class="MsoPlainText">+ uint64_t
read_val_gpu_addr = 0;<o:p></o:p></p>
<p class="MsoPlainText">+ uint32_t *read_val_ptr
= NULL;<o:p></o:p></p>
<p class="MsoPlainText">+ if
(amdgpu_device_wb_get(adev, &addr_offset)) {<o:p></o:p></p>
<p class="MsoPlainText">+
DRM_ERROR("critical bug! too many mes readers\n");<o:p></o:p></p>
<p class="MsoPlainText">+ goto error;<o:p></o:p></p>
<p class="MsoPlainText">+ }<o:p></o:p></p>
<p class="MsoPlainText">+ read_val_gpu_addr =
adev->wb.gpu_addr + (addr_offset * 4);<o:p></o:p></p>
<p class="MsoPlainText">+ read_val_ptr =
(uint32_t *)&adev->wb.wb[addr_offset];<o:p></o:p></p>
</div>
</blockquote>
<p class="MsoNormal"><br>
Please run checkpatch.pl on the patch since this code here
clearly has style issues.<br>
<br>
Apart from that looks good to me, the only potential concern
I can see is if we have enough writeback memory to cover all
concurrent threads.<br>
<br>
Regards,<br>
Christian.<br>
<br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoPlainText"> op_input.op =
MES_MISC_OP_READ_REG;<o:p></o:p></p>
<p class="MsoPlainText">
op_input.read_reg.reg_offset = reg;<o:p></o:p></p>
<p class="MsoPlainText">-
op_input.read_reg.buffer_addr =
adev->mes.read_val_gpu_addr;<o:p></o:p></p>
<p class="MsoPlainText">+
op_input.read_reg.buffer_addr = read_val_gpu_addr;<o:p></o:p></p>
<p class="MsoPlainText"> if
(!adev->mes.funcs->misc_op) {<o:p></o:p></p>
<p class="MsoPlainText">
DRM_ERROR("mes rreg is not supported!\n"); @@ -932,9
+926,11 @@ uint32_t amdgpu_mes_rreg(struct amdgpu_device
*adev, uint32_t reg)<o:p></o:p></p>
<p class="MsoPlainText"> if (r)<o:p></o:p></p>
<p class="MsoPlainText">
DRM_ERROR("failed to read reg (0x%x)\n", reg);<o:p></o:p></p>
<p class="MsoPlainText"> else<o:p></o:p></p>
<p class="MsoPlainText">- val =
*(adev->mes.read_val_ptr);<o:p></o:p></p>
<p class="MsoPlainText">+ val =
*(read_val_ptr);<o:p></o:p></p>
<p class="MsoPlainText"> error:<o:p></o:p></p>
<p class="MsoPlainText">+ if (addr_offset)<o:p></o:p></p>
<p class="MsoPlainText">+
amdgpu_device_wb_free(adev, addr_offset);<o:p></o:p></p>
<p class="MsoPlainText"> return val;<o:p></o:p></p>
<p class="MsoPlainText">}<o:p></o:p></p>
<p class="MsoPlainText">diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h<o:p></o:p></p>
<p class="MsoPlainText">index 45e3508f0f8e..83f45bb48427
100644<o:p></o:p></p>
<p class="MsoPlainText">---
a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h<o:p></o:p></p>
<p class="MsoPlainText">+++
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h<o:p></o:p></p>
<p class="MsoPlainText">@@ -119,9 +119,6 @@ struct
amdgpu_mes {<o:p></o:p></p>
<p class="MsoPlainText">
uint32_t
query_status_fence_offs[AMDGPU_MAX_MES_PIPES];<o:p></o:p></p>
<p class="MsoPlainText">
uint64_t
query_status_fence_gpu_addr[AMDGPU_MAX_MES_PIPES];<o:p></o:p></p>
<p class="MsoPlainText">
uint64_t
*query_status_fence_ptr[AMDGPU_MAX_MES_PIPES];<o:p></o:p></p>
<p class="MsoPlainText">-
uint32_t read_val_offs;<o:p></o:p></p>
<p class="MsoPlainText">-
uint64_t
read_val_gpu_addr;<o:p></o:p></p>
<p class="MsoPlainText">-
uint32_t
*read_val_ptr;<o:p></o:p></p>
<p class="MsoPlainText">
uint32_t saved_flags;<o:p></o:p></p>
<p class="MsoPlainText">--<o:p></o:p></p>
<p class="MsoPlainText">2.34.1<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
Koenig, Christian
<a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true"><Christian.Koenig@amd.com></a>
<br>
<b>Sent:</b> Thursday, October 31, 2024 6:04 PM<br>
<b>To:</b> Li, Chong(Alan) <a href="mailto:Chong.Li@amd.com" moz-do-not-send="true"><Chong.Li@amd.com></a><br>
<b>Cc:</b> cao, lin <a href="mailto:lin.cao@amd.com" moz-do-not-send="true"><lin.cao@amd.com></a>;
Yin, ZhenGuo (Chris)
<a href="mailto:ZhenGuo.Yin@amd.com" moz-do-not-send="true"><ZhenGuo.Yin@amd.com></a>;
Zhang, Tiantian (Celine)
<a href="mailto:Tiantian.Zhang@amd.com" moz-do-not-send="true"><Tiantian.Zhang@amd.com></a>;
Andjelkovic, Dejan
<a href="mailto:Dejan.Andjelkovic@amd.com" moz-do-not-send="true"><Dejan.Andjelkovic@amd.com></a>;
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true" class="moz-txt-link-freetext">
amd-gfx@lists.freedesktop.org</a><br>
<b>Subject:</b> Re: [PATCH] drm/amd/amdgpu: change
the flush gpu tlb mode to sync mode.</span><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Hi Chong,<br>
<br>
Am 31.10.24 um 10:54 schrieb Li, Chong(Alan):<br>
<br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p style="margin:5.0pt"><span style="font-size:10.0pt;font-family:"Calibri",sans-serif;color:blue">[AMD
Official Use Only - AMD Internal Distribution Only]</span><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Hi,
Christian.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Share
the process of the page fault issue in rocblas
benchmark.</span><o:p></o:p></p>
</div>
</blockquote>
<p class="MsoNormal"><br>
finally some progress here. Thanks for the update.<br>
<br>
<br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Find
when there are multithreads read register
“regIH_VMID_0_LUT” to get pasid,
</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">This
register will return error pasid value randomly,
sometimes is 0, sometimes is 32768, (the real
value is 32770).</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">After
check the invalid pasid, code will “continue” and
not flush the gpu tlb.</span><o:p></o:p></p>
</div>
</blockquote>
<p class="MsoNormal"><br>
That is really disturbing, concurrent register access is
mandatory to work correctly.<br>
<br>
Not only the TLB flush but many other operations depend
on stuff like that as well.<br>
<br>
<br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">That’s
why the page fault accours.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">After
add the lock, the register not return invalid
value, and the rocblas benchmark passed.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">You
have submit a patch "implement TLB flush fence",
in this patch you create a kernel thread to flush
gpu tlb.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">And in
main thread the function “svm_range_map_to_gpus”
will call function “kfd_flush_tlb” and then flush
gpu tlb as well.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Means
that both the two threads will call function
“gmc_v11_0_flush_gpu_tlb_pasid”.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">So
after you merge your patch, the page fault issue
accours.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">My
first patch change flush gpu tlb to sync mode,
</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">means
the one thread flush the gpu tlb twice, so my
first patch passed the rocblas benchmark.</span><o:p></o:p></p>
</div>
</blockquote>
<p class="MsoNormal"><br>
I will have to reject such patches, you need to find the
underlying problem and not mitigate the symptoms.<br>
<br>
<br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">I
already submit an email to firmware team to ask
why the register will return wrong value.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">But if
the firmware team not able to solve this issue, or
need a long time to solve this issue,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">I will
submit the patch like below to do the workaround.</span><o:p></o:p></p>
</div>
</blockquote>
<p class="MsoNormal"><br>
Well that basically means a complete stop for any
deliverable.<br>
<br>
The driver stack simply won't work correctly when
register reads return random values like that.<br>
<br>
Regards,<br>
Christian.<br>
<br>
<br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><img style="width:7.7083in;height:6.7583in" id="_x0000_i1032" src="cid:part2.6q0WVcpJ.f0081mE7@amd.com" class="" width="740" height="649" border="0"></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Thanks,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Chong.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Li,
Chong(Alan)
<br>
<b>Sent:</b> Friday, October 25, 2024 2:46 PM<br>
<b>To:</b> Koenig, Christian <a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true"><Christian.Koenig@amd.com></a>;
Andjelkovic, Dejan
<a href="mailto:Dejan.Andjelkovic@amd.com" moz-do-not-send="true"><Dejan.Andjelkovic@amd.com></a><br>
<b>Cc:</b> cao, lin <a href="mailto:lin.cao@amd.com" moz-do-not-send="true"><lin.cao@amd.com></a>;
Yin, ZhenGuo (Chris)
<a href="mailto:ZhenGuo.Yin@amd.com" moz-do-not-send="true"><ZhenGuo.Yin@amd.com></a>;
Zhang, Tiantian (Celine)
<a href="mailto:Tiantian.Zhang@amd.com" moz-do-not-send="true"><Tiantian.Zhang@amd.com></a>;
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true" class="moz-txt-link-freetext">
amd-gfx@lists.freedesktop.org</a><br>
<b>Subject:</b> RE: [PATCH] drm/amd/amdgpu:
change the flush gpu tlb mode to sync mode.</span><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Hi,
Christian.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">The
size of log file so large, can’t paste in the
Email.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">I copy
the log file in directory “</span><a href="file://ark/incoming/chong/log" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">\\ark\incoming\chong\log</span></a><span style="font-size:11.0pt;font-family:"Arial",sans-serif">”, the
log file name is “kern.log”.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Can
you access this directory ?</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Thanks,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Chong.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
Koenig, Christian <</span><a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Christian.Koenig@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">>
<br>
<b>Sent:</b> Thursday, October 24, 2024 7:22
PM<br>
<b>To:</b> Li, Chong(Alan) <</span><a href="mailto:Chong.Li@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Chong.Li@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">>;
Andjelkovic, Dejan <</span><a href="mailto:Dejan.Andjelkovic@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Dejan.Andjelkovic@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">><br>
<b>Cc:</b> cao, lin <</span><a href="mailto:lin.cao@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">lin.cao@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">>;
Yin, ZhenGuo (Chris) <</span><a href="mailto:ZhenGuo.Yin@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">ZhenGuo.Yin@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">>;
Zhang, Tiantian (Celine) <</span><a href="mailto:Tiantian.Zhang@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Tiantian.Zhang@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">>;
Raina, Yera <</span><a href="mailto:Yera.Raina@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Yera.Raina@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">><br>
<b>Subject:</b> Re: [PATCH] drm/amd/amdgpu:
change the flush gpu tlb mode to sync mode.</span><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal" style="margin-bottom:12.0pt">Do
you have the full log as text file? As image it's
pretty much useless.<br>
<br>
Regards,<br>
Christian.<o:p></o:p></p>
<div>
<p class="MsoNormal">Am 24.10.24 um 09:41 schrieb
Li, Chong(Alan):<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p style="margin:5.0pt"><span style="font-size:10.0pt;font-family:"Calibri",sans-serif;color:blue">[AMD
Official Use Only - AMD Internal Distribution
Only]</span><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Hi,
</span>Christian.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">We can
see the dmesg log,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">After
address “7ef90be00” already update the ptes,
page fault still happen.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><img style="width:7.4333in;height:2.7083in" id="Picture_x0020_7" src="cid:part3.ZJwV2p8C.xhDSsZdU@amd.com" class="" width="714" height="260" border="0"></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><img style="width:8.2166in;height:2.1583in" id="Picture_x0020_6" src="cid:part4.43RZtJBg.w0te2POq@amd.com" class="" width="789" height="207" border="0"></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Thanks,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Chong.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
Koenig, Christian
</span><a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Christian.Koenig@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
<br>
<b>Sent:</b> Wednesday, October 23, 2024
5:26 PM<br>
<b>To:</b> Li, Chong(Alan) </span><a href="mailto:Chong.Li@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Chong.Li@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">;
Andjelkovic, Dejan
</span><a href="mailto:Dejan.Andjelkovic@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Dejan.Andjelkovic@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Cc:</b> cao, lin </span><a href="mailto:lin.cao@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><lin.cao@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">;
Yin, ZhenGuo (Chris)
</span><a href="mailto:ZhenGuo.Yin@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><ZhenGuo.Yin@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">;
Zhang, Tiantian (Celine)
</span><a href="mailto:Tiantian.Zhang@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Tiantian.Zhang@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">;
Raina, Yera
</span><a href="mailto:Yera.Raina@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Yera.Raina@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Subject:</b> Re: [PATCH]
drm/amd/amdgpu: change the flush gpu tlb
mode to sync mode.</span><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal" style="margin-bottom:12.0pt">Hi
Chong,<br>
<br>
oh that could indeed be.<br>
<br>
I suggest to add a trace point for the page
fault so that we can guarantee that we use the
same time basis for both events.<br>
<br>
That should make it trivial to compare them.<br>
<br>
Regards,<br>
Christian.<o:p></o:p></p>
<div>
<p class="MsoNormal">Am 23.10.24 um 10:17
schrieb Li, Chong(Alan):<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p style="margin:5.0pt"><span style="font-size:10.0pt;font-family:"Calibri",sans-serif;color:blue">[AMD
Official Use Only - AMD Internal
Distribution Only]</span><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Hi,
Christian.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><b><span style="font-family:"Arial",sans-serif">I add a log in kernel,
and prove the timestamp in tracing log
is slower than dmesg log,
</span></b><o:p></o:p></p>
<p class="MsoNormal"><b><span style="font-family:"Arial",sans-serif">so we can’t give a
conclusion that the issue in rocm.</span></b><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">------------------------
the information I sync with Andjelkovic,
Dejan
----------------------------------------</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">dmesg
shows that the page fault happens address
“0x000072e5f4401000” at time
“6587.772178”,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><img style="width:13.7333in;height:1.2416in" id="_x0000_i1029" src="cid:part5.IJAonAWF.Jvu1V0W0@amd.com" class="" width="1318" height="119" border="0"></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">tracing
log shows that the function
“amdgpu_vm_update_ptes” be called at time
“6587.790869”,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><img style="width:12.1583in;height:.25in" id="Picture_x0020_4" src="cid:part6.bpCvHIm1.tP5jAkKa@amd.com" class="" width="1167" height="24" border="0"></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">------------------------
the information I sync with Andjelkovic,
Dejan
----------------------------------------</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">From
the log time stamp, you give a conclusion
that “</span><span class="ui-provider">The
test tries to access memory before it is
probably mapped and that is provable by
looking into the tracelogs.</span><span style="font-size:11.0pt;font-family:"Arial",sans-serif">”.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">But
after I review the code, the function
“amdgpu_vm_ptes_update” be called in
function “svm_range_set_attr”,
</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">So,
after this log in above dmesg print “[
6587.772136] amdgpu: pasid 0x8002 svms
0x000000008b03ff39 [0x72e5f4400
0x72e5fc3ff] done, r=0”,
</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">the
function “svm_range_set_attr” will leave,
in that time “amdgpu_vm_ptes_update” is
already be called, the timestamp is not
reasonable.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">I
think maybe the timestamp in tracing log
has some delay, and I add a line of log in
kernel to verify my guess,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><img style="width:8.25in;height:1.0166in" id="Picture_x0020_3" src="cid:part7.E9sWgtyL.jGjIZPYm@amd.com" class="" width="792" height="98" border="0"></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">The
below is the result:</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">tracing
log shows the address “ffffffc00” at time
“227.298607”,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">dmesg
log print the address “ffffffc00” at time
“226.756137”.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">traing
log:</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><img style="width:13.9166in;height:1.4916in" id="Picture_x0020_2" src="cid:part8.0rUzFhL5.tW60mvYy@amd.com" class="" width="1336" height="143" border="0"></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">dmesg
log:</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"><img style="width:9.9666in;height:1.1666in" id="Picture_x0020_1" src="cid:part9.sKI2WdSU.eEU3nuG5@amd.com" class="" width="957" height="112" border="0"></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Thanks,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Chong.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Li,
Chong(Alan)
<br>
<b>Sent:</b> Monday, October 21, 2024
6:38 PM<br>
<b>To:</b> Koenig, Christian </span><a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Christian.Koenig@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">;
Raina, Yera
</span><a href="mailto:Yera.Raina@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Yera.Raina@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">;
Andjelkovic, Dejan
</span><a href="mailto:Dejan.Andjelkovic@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Dejan.Andjelkovic@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Cc:</b> cao, lin </span><a href="mailto:lin.cao@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><lin.cao@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">;
Yin, ZhenGuo (Chris)
</span><a href="mailto:ZhenGuo.Yin@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><ZhenGuo.Yin@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">;
Zhang, Tiantian (Celine)
</span><a href="mailto:Tiantian.Zhang@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Tiantian.Zhang@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Subject:</b> RE: [PATCH]
drm/amd/amdgpu: change the flush gpu
tlb mode to sync mode.</span><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Hi,
</span>Christian.<o:p></o:p></p>
<p class="MsoNormal">Thanks for your reply,<o:p></o:p></p>
<p class="MsoNormal">And do you have any
advice <span style="font-family:DengXian">
about</span> this issue?<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Hi,
Raina, Year.<br>
Share I assign this ticket </span><a href="https://ontrack-internal.amd.com/browse/SWDEV-459983" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">SWDEV-459983</span></a><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> to
rocm team?</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Thanks,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Chong.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
Koenig, Christian <</span><a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Christian.Koenig@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">>
<br>
<b>Sent:</b> Monday, October 21, 2024
6:08 PM<br>
<b>To:</b> Li, Chong(Alan) <</span><a href="mailto:Chong.Li@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Chong.Li@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">>;
Raina, Yera <</span><a href="mailto:Yera.Raina@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Yera.Raina@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">><br>
<b>Cc:</b> cao, lin <</span><a href="mailto:lin.cao@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">lin.cao@amd.com</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">>;
</span><a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">amd-gfx@lists.freedesktop.org</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Subject:</b> Re: [PATCH]
drm/amd/amdgpu: change the flush gpu
tlb mode to sync mode.</span><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal" style="margin-bottom:12.0pt">Hi Chong,<br>
<br>
Andjelkovic just shared a bunch of traces
from rocm on teams with me which I analyzed.<br>
<br>
When you know what you look for it's
actually pretty obvious what's going on.
Just look at the timestamp of the fault and
compare that with the timestamp of the
operation mapping something at the given
address.<br>
<br>
When mapping an address happens only after
accessing an address then there is clearly
something wrong in the code which
coordinates this and that is the ROCm stress
test tool in this case.<br>
<br>
Regards,<br>
Christian.<o:p></o:p></p>
<div>
<p class="MsoNormal">Am 21.10.24 um 11:02
schrieb Li, Chong(Alan):<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p style="margin:5.0pt"><span style="font-size:10.0pt;font-family:"Calibri",sans-serif;color:blue">[AMD
Official Use Only - AMD Internal
Distribution Only]</span><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Hi,
Christian,
</span> <span style="font-size:11.0pt;font-family:"Arial",sans-serif">Raina,
Yera.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">If
this issue in rocm, I need assign my
ticket
</span><a href="https://ontrack-internal.amd.com/browse/SWDEV-459983" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">SWDEV-459983</span></a><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> to
rocm team.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Is
there anything to share with the rocm
pm?</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Such
as the Email or chat history or the
ticket you talk with
</span><span class="ui-provider">Andjelkovic.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Thanks,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif">Chong.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Arial",sans-serif"> </span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
Koenig, Christian
</span><a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Christian.Koenig@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
<br>
<b>Sent:</b> Monday, October 21,
2024 4:00 PM<br>
<b>To:</b> Li, Chong(Alan) </span><a href="mailto:Chong.Li@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><Chong.Li@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">;
</span><a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">amd-gfx@lists.freedesktop.org</span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Cc:</b> cao, lin </span><a href="mailto:lin.cao@amd.com" moz-do-not-send="true"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><lin.cao@amd.com></span></a><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Subject:</b> Re: [PATCH]
drm/amd/amdgpu: change the flush
gpu tlb mode to sync mode.</span><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal" style="margin-bottom:12.0pt">Am 21.10.24
um 07:56 schrieb Chong Li:<br>
<br>
<br>
<br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>change the gpu tlb flush mode to sync mode to<o:p></o:p></pre>
<pre>solve the issue in the rocm stress test.<o:p></o:p></pre>
</blockquote>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
And again complete NAK to this.<br>
<br>
I've already proven together with <span class="ui-provider">Andjelkovic that
the problem is that the rocm stress
test is broken.</span><br>
<br>
<span class="ui-provider">The test tries
to access memory before it is probably
mapped and that is provable by looking
into the tracelogs.</span><br>
<br>
<span class="ui-provider">Regards,</span><br>
<span class="ui-provider">Christian. </span><br>
<br>
<br>
<br>
<br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre> <o:p></o:p></pre>
<pre> <o:p></o:p></pre>
<pre>Signed-off-by: Chong Li <a href="mailto:chongli2@amd.com" moz-do-not-send="true"><chongli2@amd.com></a><o:p></o:p></pre>
<pre>---<o:p></o:p></pre>
<pre> drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c | 4 ++--<o:p></o:p></pre>
<pre> 1 file changed, 2 insertions(+), 2 deletions(-)<o:p></o:p></pre>
<pre> <o:p></o:p></pre>
<pre>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c<o:p></o:p></pre>
<pre>index 51cddfa3f1e8..4d9ff7b31618 100644<o:p></o:p></pre>
<pre>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c<o:p></o:p></pre>
<pre>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c<o:p></o:p></pre>
<pre>@@ -98,7 +98,6 @@ void amdgpu_vm_tlb_fence_create(struct amdgpu_device *adev, struct amdgpu_vm *vm<o:p></o:p></pre>
<pre> f->adev = adev;<o:p></o:p></pre>
<pre> f->dependency = *fence;<o:p></o:p></pre>
<pre> f->pasid = vm->pasid;<o:p></o:p></pre>
<pre>- INIT_WORK(&f->work, amdgpu_tlb_fence_work);<o:p></o:p></pre>
<pre> spin_lock_init(&f->lock);<o:p></o:p></pre>
<pre> <o:p></o:p></pre>
<pre> dma_fence_init(&f->base, &amdgpu_tlb_fence_ops, &f->lock,<o:p></o:p></pre>
<pre>@@ -106,7 +105,8 @@ void amdgpu_vm_tlb_fence_create(struct amdgpu_device *adev, struct amdgpu_vm *vm<o:p></o:p></pre>
<pre> <o:p></o:p></pre>
<pre> /* TODO: We probably need a separate wq here */<o:p></o:p></pre>
<pre> dma_fence_get(&f->base);<o:p></o:p></pre>
<pre>- schedule_work(&f->work);<o:p></o:p></pre>
<pre> <o:p></o:p></pre>
<pre> *fence = &f->base;<o:p></o:p></pre>
<pre>+<o:p></o:p></pre>
<pre>+ amdgpu_tlb_fence_work(&f->work);<o:p></o:p></pre>
<pre> }<o:p></o:p></pre>
</blockquote>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</blockquote>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</blockquote>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</blockquote>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</blockquote>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</blockquote>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</blockquote>
<br>
</body>
</html>