<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Hey Christian, Denis, see bellow - <br>
</p>
<div class="moz-cite-prefix">On 2021-04-06 6:34 a.m., Christian
König wrote:<br>
</div>
<blockquote type="cite" cite="mid:51d7873d-cf35-6be5-79c2-024937c67f6a@amd.com">
Hi Andrey,<br>
<br>
well good question. My job is to watch over the implementation and
design and while I always help I can adjust anybodies schedule.<br>
<br>
Is the patch to print a warning when the hardware is accessed
without holding the locks merged yet? If not then that would
probably be a good starting point.<br>
</blockquote>
<p><br>
</p>
<p>It's merged into amd-staging-drm-next and since I work on
drm-misc-next I will cherry-pick it into there.<br>
</p>
<p><br>
</p>
<blockquote type="cite" cite="mid:51d7873d-cf35-6be5-79c2-024937c67f6a@amd.com"> <br>
Then we would need to unify this with the SRCU to make sure that
we have both the reset lock as well as block the hotplug code from
reusing the MMIO space.<br>
</blockquote>
<br>
<p>In my understanding there is a significant difference between
handling of GPU reset and unplug - while GPU reset use case
requires any HW accessing code to block and wait for the reset to
finish and then proceed, hot-unplug<br>
is permanent and hence no need to wait and proceed but rather
abort at once. This why I think that in any place we already check
for device reset we should also add a check for hot-unplug but the
handling would be different<br>
in that for hot-unplug we would abort instead of keep waiting.</p>
<p>Similar to handling device reset for unplug we obviously also
need to stop and block any MMIO accesses once device is unplugged
and, as Daniel Vetter mentioned - we have to do it before
finishing pci_remove (early device fini)<br>
and not later (when last device reference is dropped from user
space) in order to prevent reuse of MMIO space we still access by
other hot plugging devices. As in device reset case we need to
cancel all delay works, stop drm schedule, complete all unfinished
fences(both HW and scheduler fences). While you stated strong
objection to force signalling scheduler fences from GPU reset,
quote: <br>
</p>
<p>"you can't signal the dma_fence waiting. Waiting for a dma_fence
also means you wait for the GPU reset to finish. When we would
signal the dma_fence during the GPU reset then we would run into
memory corruption because the hardware jobs running after the GPU
reset would access memory which is already freed."<br>
To my understating this is a key difference with hot-unplug, the
device is gone, all those concerns are irrelevant and hence we can
actually force signal scheduler fences (setting and error to them
before) to force completion of any<br>
waiting clients such as possibly IOCTLs or async page flips e.t.c.<br>
</p>
<p>Beyond blocking all delayed works and scheduler threads we also
need to guarantee no IOCTL can access MMIO post device unplug OR
in flight IOCTLs are done before we finish pci_remove
(amdgpu_pci_remove for us).<br>
For this I suggest we do something like what we worked on with
Takashi Iwai the ALSA maintainer recently when he helped
implementing PCI BARs move support for snd_hda_intel. Take a look
at<br>
<a class="moz-txt-link-freetext" href="https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=cbaa324799718e2b828a8c7b5b001dd896748497">https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=cbaa324799718e2b828a8c7b5b001dd896748497</a>
and<br>
<a class="moz-txt-link-freetext" href="https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=e36365d9ab5bbc30bdc221ab4b3437de34492440">https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=e36365d9ab5bbc30bdc221ab4b3437de34492440</a><br>
We also had same issue there, how to prevent MMIO accesses while
the BARs are migrating. What was done there is a refcount was
added to count all IOCTLs in flight, for any in flight IOCTL the
BAR migration handler would <br>
block for the refcount to drop to 0 before it would proceed, for
any later IOCTL it stops and wait if device is in migration state.
We even don't need the wait part, nothing to wait for, we just
return with -ENODEV for this case.</p>
<p>The above approach should allow us to wait for all the IOCTLs in
flight, together with stopping scheduler threads and cancelling
and flushing all in flight work items and timers i think It should
give as full solution for the hot-unplug case<br>
of preventing any MMIO accesses post device pci_remove.<br>
<br>
Let me know what you think guys.</p>
<p>Andrey</p>
<p><br>
</p>
<blockquote type="cite" cite="mid:51d7873d-cf35-6be5-79c2-024937c67f6a@amd.com"> <br>
And then testing, testing, testing to see if we have missed
something.<br>
<br>
Christian.<br>
<br>
<div class="moz-cite-prefix">Am 05.04.21 um 19:58 schrieb Andrey
Grodzovsky:<br>
</div>
<blockquote type="cite" cite="mid:1e37bb4d-f54d-1b7e-4632-94cfcf749528@amd.com">
<p>Denis, Christian, are there any updates in the plan on how to
move on with this ? As you know I need very similar code for
my up-streaming of device hot-unplug. My latest solution (<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/archives/amd-gfx/2021-January/058606.html" moz-do-not-send="true">https://lists.freedesktop.org/archives/amd-gfx/2021-January/058606.html</a>)
was not acceptable because of low level guards on the register
accessors level which was hurting performance. Basically I
need a way to prevent any MMIO write accesses from kernel
driver after device is removed (UMD accesses are taken care of
by page faulting dummy page). We are using now hot-unplug code
for Freemont program and so up-streaming became more of a
priority then before. This MMIO access issue is currently my
main blocker from up-streaming. Is there any way I can assist
in pushing this on ?</p>
<p>Andrey <br>
</p>
<div class="moz-cite-prefix">On 2021-03-18 5:51 a.m., Christian
König wrote:<br>
</div>
<blockquote type="cite" cite="mid:378fdffb-99b5-2a14-736d-a06f310b040c@amd.com"> Am
18.03.21 um 10:30 schrieb Li, Dennis:<br>
<blockquote type="cite" cite="mid:DM5PR12MB253379E8C89D8A20C8A0245AED699@DM5PR12MB2533.namprd12.prod.outlook.com">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]-->
<style>@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
{font-family:等线;
panose-1:2 1 6 0 3 1 1 1 1 1;}@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face
{font-family:"\@等线";
panose-1:2 1 6 0 3 1 1 1 1 1;}@font-face
{font-family:"Segoe UI";
panose-1:2 11 5 2 4 2 4 2 2 3;}p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}div.WordSection1
{page:WordSection1;}</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal">>>> The GPU reset doesn't
complete the fences we wait for. It only completes the
hardware fences as part of the reset.<o:p></o:p></p>
<p class="MsoNormal">>>> So waiting for a fence
while holding the reset lock is illegal and needs to be
avoided.<o:p></o:p></p>
<p class="MsoNormal">I understood your concern. It is more
complex for DRM GFX, therefore I abandon adding lock
protection for DRM ioctls now. Maybe we can try to add
all kernel dma_fence waiting in a list, and signal all
in recovery threads. Do you have same concern for
compute cases? </p>
</div>
</blockquote>
<br>
Yes, compute (KFD) is even harder to handle.<br>
<br>
See you can't signal the dma_fence waiting. Waiting for a
dma_fence also means you wait for the GPU reset to finish.<br>
<br>
When we would signal the dma_fence during the GPU reset then
we would run into memory corruption because the hardware jobs
running after the GPU reset would access memory which is
already freed.<br>
<br>
<blockquote type="cite" cite="mid:DM5PR12MB253379E8C89D8A20C8A0245AED699@DM5PR12MB2533.namprd12.prod.outlook.com">
<div class="WordSection1">
<p class="MsoNormal"><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">>>> Lockdep also complains
about this when it is used correctly. The only reason it
doesn't complain here is because you use an
atomic+wait_event instead of a locking primitive.<o:p></o:p></p>
<p class="MsoNormal">Agree. This approach will escape the
monitor of lockdep. Its goal is to block other threads
when GPU recovery thread start. But I couldn’t find a
better method to solve this problem. Do you have some
suggestion? </p>
</div>
</blockquote>
<br>
Well, completely abandon those change here.<br>
<br>
What we need to do is to identify where hardware access
happens and then insert taking the read side of the GPU reset
lock so that we don't wait for a dma_fence or allocate memory,
but still protect the hardware from concurrent access and
reset.<br>
<br>
Regards,<br>
Christian.<br>
<br>
<blockquote type="cite" cite="mid:DM5PR12MB253379E8C89D8A20C8A0245AED699@DM5PR12MB2533.namprd12.prod.outlook.com">
<div class="WordSection1">
<p class="MsoNormal"><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Best Regards<o:p></o:p></p>
<p class="MsoNormal">Dennis Li<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Koenig, Christian <a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true"><Christian.Koenig@amd.com></a>
<br>
<b>Sent:</b> Thursday, March 18, 2021 4:59 PM<br>
<b>To:</b> Li, Dennis <a class="moz-txt-link-rfc2396E" href="mailto:Dennis.Li@amd.com" moz-do-not-send="true"><Dennis.Li@amd.com></a>;
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>;
Deucher, Alexander <a class="moz-txt-link-rfc2396E" href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true"><Alexander.Deucher@amd.com></a>;
Kuehling, Felix <a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com" moz-do-not-send="true"><Felix.Kuehling@amd.com></a>;
Zhang, Hawking <a class="moz-txt-link-rfc2396E" href="mailto:Hawking.Zhang@amd.com" moz-do-not-send="true"><Hawking.Zhang@amd.com></a><br>
<b>Subject:</b> AW: [PATCH 0/4] Refine GPU recovery
sequence to enhance its stability<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Segoe
UI",sans-serif;color:black">Exactly that's what
you don't seem to understand.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Segoe
UI",sans-serif;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Segoe
UI",sans-serif;color:black">The GPU reset
doesn't complete the fences we wait for. It only
completes the hardware fences as part of the reset.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Segoe
UI",sans-serif;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Segoe
UI",sans-serif;color:black">So waiting for a
fence while holding the reset lock is illegal and
needs to be avoided.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Segoe
UI",sans-serif;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Segoe
UI",sans-serif;color:black">Lockdep also
complains about this when it is used correctly. The
only reason it doesn't complain here is because you
use an atomic+wait_event instead of a locking
primitive.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Segoe
UI",sans-serif;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Segoe
UI",sans-serif;color:black">Regards,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Segoe
UI",sans-serif;color:black">Christian.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div class="MsoNormal" style="text-align:center" align="center">
<hr width="98%" size="2" align="center"> </div>
<div id="divRplyFwdMsg">
<p class="MsoNormal"><b><span style="color:black">Von:</span></b><span style="color:black"> Li, Dennis <<a href="mailto:Dennis.Li@amd.com" moz-do-not-send="true">Dennis.Li@amd.com</a>><br>
<b>Gesendet:</b> Donnerstag, 18. März 2021 09:28<br>
<b>An:</b> Koenig, Christian <<a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true">Christian.Koenig@amd.com</a>>;
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>
<<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>>;
Deucher, Alexander <<a href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true">Alexander.Deucher@amd.com</a>>;
Kuehling, Felix <<a href="mailto:Felix.Kuehling@amd.com" moz-do-not-send="true">Felix.Kuehling@amd.com</a>>;
Zhang, Hawking <<a href="mailto:Hawking.Zhang@amd.com" moz-do-not-send="true">Hawking.Zhang@amd.com</a>><br>
<b>Betreff:</b> RE: [PATCH 0/4] Refine GPU recovery
sequence to enhance its stability</span> <o:p></o:p></p>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">>>> Those two steps need
to be exchanged or otherwise it is possible that new
delayed work items etc are started before the lock
is taken.<br>
What about adding check for adev->in_gpu_reset in
work item? If exchange the two steps, it maybe
introduce the deadlock. For example, the user
thread hold the read lock and waiting for the fence,
if recovery thread try to hold write lock and then
complete fences, in this case, recovery thread will
always be blocked.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
Best Regards<br>
Dennis Li<br>
-----Original Message-----<br>
From: Koenig, Christian <<a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true">Christian.Koenig@amd.com</a>>
<br>
Sent: Thursday, March 18, 2021 3:54 PM<br>
To: Li, Dennis <<a href="mailto:Dennis.Li@amd.com" moz-do-not-send="true">Dennis.Li@amd.com</a>>;
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">
amd-gfx@lists.freedesktop.org</a>; Deucher,
Alexander <<a href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true">Alexander.Deucher@amd.com</a>>;
Kuehling, Felix <<a href="mailto:Felix.Kuehling@amd.com" moz-do-not-send="true">Felix.Kuehling@amd.com</a>>;
Zhang, Hawking <<a href="mailto:Hawking.Zhang@amd.com" moz-do-not-send="true">Hawking.Zhang@amd.com</a>><br>
Subject: Re: [PATCH 0/4] Refine GPU recovery
sequence to enhance its stability<br>
<br>
Am 18.03.21 um 08:23 schrieb Dennis Li:<br>
> We have defined two variables in_gpu_reset and
reset_sem in adev object. The atomic type variable
in_gpu_reset is used to avoid recovery thread
reenter and make lower functions return more earlier
when recovery start, but couldn't block recovery
thread when it access hardware. The r/w semaphore
reset_sem is used to solve these synchronization
issues between recovery thread and other threads.<br>
><br>
> The original solution locked registers' access
in lower functions, which will introduce following
issues:<br>
><br>
> 1) many lower functions are used in both
recovery thread and others. Firstly we must harvest
these functions, it is easy to miss someones.
Secondly these functions need select which lock
(read lock or write lock) will be used, according to
the thread it is running in. If the thread context
isn't considered, the added lock will easily
introduce deadlock. Besides that, in most time,
developer easily forget to add locks for new
functions.<br>
><br>
> 2) performance drop. More lower functions are
more frequently called.<br>
><br>
> 3) easily introduce false positive lockdep
complaint, because write lock has big range in
recovery thread, but low level functions will hold
read lock may be protected by other locks in other
threads.<br>
><br>
> Therefore the new solution will try to add lock
protection for ioctls of kfd. Its goal is that there
are no threads except for recovery thread or its
children (for xgmi) to access hardware when doing
GPU reset and resume. So refine recovery thread as
the following:<br>
><br>
> Step 0:
atomic_cmpxchg(&adev->in_gpu_reset, 0, 1)<br>
> 1). if failed, it means system had a
recovery thread running, current thread exit
directly;<br>
> 2). if success, enter recovery thread;<br>
><br>
> Step 1: cancel all delay works, stop drm
schedule, complete all unreceived fences and so on.
It try to stop or pause other threads.<br>
><br>
> Step 2: call
down_write(&adev->reset_sem) to hold write
lock, which will block recovery thread until other
threads release read locks.<br>
<br>
Those two steps need to be exchanged or otherwise it
is possible that new delayed work items etc are
started before the lock is taken.<br>
<br>
Just to make it clear until this is fixed the whole
patch set is a NAK.<br>
<br>
Regards,<br>
Christian.<br>
<br>
><br>
> Step 3: normally, there is only recovery
threads running to access hardware, it is safe to do
gpu reset now.<br>
><br>
> Step 4: do post gpu reset, such as call all
ips' resume functions;<br>
><br>
> Step 5: atomic set adev->in_gpu_reset as 0,
wake up other threads and release write lock.
Recovery thread exit normally.<br>
><br>
> Other threads call the amdgpu_read_lock to
synchronize with recovery thread. If it finds that
in_gpu_reset is 1, it should release read lock if it
has holden one, and then blocks itself to wait for
recovery finished event. If thread successfully hold
read lock and in_gpu_reset is 0, it continues. It
will exit normally or be stopped by recovery thread
in step 1.<br>
><br>
> Dennis Li (4):<br>
> drm/amdgpu: remove reset lock from low level
functions<br>
> drm/amdgpu: refine the GPU recovery sequence<br>
> drm/amdgpu: instead of using down/up_read
directly<br>
> drm/amdkfd: add reset lock protection for
kfd entry functions<br>
><br>
> drivers/gpu/drm/amd/amdgpu/amdgpu.h
| 6 +<br>
> drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
| 14 +-<br>
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
| 173 +++++++++++++-----<br>
> .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
| 8 -<br>
> drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
| 4 +-<br>
> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
| 9 +-<br>
> drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
| 5 +-<br>
> drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
| 5 +-<br>
> drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
| 172 ++++++++++++++++-<br>
> drivers/gpu/drm/amd/amdkfd/kfd_priv.h
| 3 +-<br>
> drivers/gpu/drm/amd/amdkfd/kfd_process.c
| 4 +<br>
> .../amd/amdkfd/kfd_process_queue_manager.c
| 17 ++<br>
> 12 files changed, 345 insertions(+), 75
deletions(-)<br>
><o:p></o:p></p>
</div>
</div>
</div>
</blockquote>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
amd-gfx mailing list
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
</pre>
</blockquote>
</blockquote>
<br>
</blockquote>
</body>
</html>