<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<p style="font-family:Arial;font-size:10pt;color:#008000;margin:15pt;" align="Left">
[Public]<br>
</p>
<br>
<div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Will do.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Alex</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Kuehling, Felix <Felix.Kuehling@amd.com><br>
<b>Sent:</b> Monday, July 18, 2022 11:29 AM<br>
<b>To:</b> Mike Lothian <mike@fireburn.co.uk>; Christian König <ckoenig.leichtzumerken@gmail.com><br>
<b>Cc:</b> Pan, Xinhui <Xinhui.Pan@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com><br>
<b>Subject:</b> Re: [PATCH] drm/amdgpu: Fix a NULL pointer of fence</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">Xinhui submitted this patch instead, which should address the same
<br>
issue: "drm/amdgpu: Remove one duplicated ef removal"<br>
<br>
Alex, can you pick up that patch for drm-fixes for 5.19, if it's not too <br>
late?<br>
<br>
Thanks,<br>
Felix<br>
<br>
<br>
On 2022-07-18 10:58, Mike Lothian wrote:<br>
> Is this likely to land before 5.19 final? It's been nearly 2 weeks<br>
> since I said if fixed an issue I was seeing<br>
> <a href="https://gitlab.freedesktop.org/drm/amd/-/issues/2074">https://gitlab.freedesktop.org/drm/amd/-/issues/2074</a><br>
><br>
> On Fri, 8 Jul 2022 at 10:05, Christian König<br>
> <ckoenig.leichtzumerken@gmail.com> wrote:<br>
>> Hi guys,<br>
>><br>
>> well the practice to remove all fences by adding a NULL exclusive fence<br>
>> was pretty much illegal in the first place because this also removes<br>
>> kernel internal fences which can lead to freeing up memory which is<br>
>> still accessed.<br>
>><br>
>> I've just didn't noticed that this was used by the KFD code as well<br>
>> otherwise I would have pushed to clean that up much earlier.<br>
>><br>
>> Regards,<br>
>> Christian.<br>
>><br>
>> Am 08.07.22 um 03:08 schrieb Pan, Xinhui:<br>
>>> [AMD Official Use Only - General]<br>
>>><br>
>>> Felix,<br>
>>> Shared fences depend on exclusive fence, so add a new exclusive fence, say NULL would also remove all shared fences. That works before 5.18 . 😉<br>
>>> From 5.18, adding a new exclusive fence(the write usage fence) did not remove any shared fences(the read usage fence). So that is broken.<br>
>>><br>
>>> And I also try the debug_evictions parameter. No unexpected eviction shows anyway.<br>
>>> I did a quick check and found amdgpu implement BO release notify and it will remove kfd ef on pt/pd BO. So we don’t need this duplicated ef removal. The interesting thing is that is done by patch f4a3c42b5c52("drm/amdgpu: Remove kfd eviction fence before
release bo (v2)") which is from me 😉 I totally forgot it.<br>
>>><br>
>>> So I would make a new patch to remove this duplicated ef removal.<br>
>>><br>
>>> -----Original Message-----<br>
>>> From: Kuehling, Felix <Felix.Kuehling@amd.com><br>
>>> Sent: Thursday, July 7, 2022 11:47 PM<br>
>>> To: Christian König <ckoenig.leichtzumerken@gmail.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; amd-gfx@lists.freedesktop.org<br>
>>> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com><br>
>>> Subject: Re: [PATCH] drm/amdgpu: Fix a NULL pointer of fence<br>
>>><br>
>>> Am 2022-07-07 um 05:54 schrieb Christian König:<br>
>>>> Am 07.07.22 um 11:50 schrieb xinhui pan:<br>
>>>>> Fence is accessed by dma_resv_add_fence() now.<br>
>>>>> Use amdgpu_amdkfd_remove_eviction_fence instead.<br>
>>>>><br>
>>>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com><br>
>>>>> ---<br>
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++--<br>
>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)<br>
>>>>><br>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c<br>
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c<br>
>>>>> index 0036c9e405af..1e25c400ce4f 100644<br>
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c<br>
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c<br>
>>>>> @@ -1558,10 +1558,10 @@ void amdgpu_amdkfd_gpuvm_destroy_cb(struct<br>
>>>>> amdgpu_device *adev,<br>
>>>>> if (!process_info)<br>
>>>>> return;<br>
>>>>> -<br>
>>>>> /* Release eviction fence from PD */<br>
>>>>> amdgpu_bo_reserve(pd, false);<br>
>>>>> - amdgpu_bo_fence(pd, NULL, false);<br>
>>>>> + amdgpu_amdkfd_remove_eviction_fence(pd,<br>
>>>>> + process_info->eviction_fence);<br>
>>>> Good catch as well, but Felix needs to take a look at this.<br>
>>> This is weird. We used amdgpu_bo_fence(pd, NULL, false) here, which would have removed an exclusive fence. But as far as I can tell we added the fence as a shared fence in init_kfd_vm and amdgpu_amdkfd_gpuvm_restore_process_bos. So this probably never worked
as intended.<br>
>>><br>
>>> You could try if this is really needed. Just remove the eviction fence removal. Then enable eviction debugging with<br>
>>><br>
>>> echo Y > /sys/module/amdgpu/parameters/debug_evictions<br>
>>><br>
>>> Run some simple tests and check the kernel log to see if process termination is causing any unexpected evictions.<br>
>>><br>
>>> Regards,<br>
>>> Felix<br>
>>><br>
>>><br>
>>>> Regards,<br>
>>>> Christian.<br>
>>>><br>
>>>>> amdgpu_bo_unreserve(pd);<br>
>>>>> /* Update process info */<br>
</div>
</span></font></div>
</div>
</body>
</html>