[PATCH] drm/amdgpu: Fix a NULL pointer of fence
Felix Kuehling
felix.kuehling at amd.com
Thu Jul 7 15:47:22 UTC 2022
Am 2022-07-07 um 05:54 schrieb Christian König:
> Am 07.07.22 um 11:50 schrieb xinhui pan:
>> Fence is accessed by dma_resv_add_fence() now.
>> Use amdgpu_amdkfd_remove_eviction_fence instead.
>>
>> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index 0036c9e405af..1e25c400ce4f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -1558,10 +1558,10 @@ void amdgpu_amdkfd_gpuvm_destroy_cb(struct
>> amdgpu_device *adev,
>> if (!process_info)
>> return;
>> -
>> /* Release eviction fence from PD */
>> amdgpu_bo_reserve(pd, false);
>> - amdgpu_bo_fence(pd, NULL, false);
>> + amdgpu_amdkfd_remove_eviction_fence(pd,
>> + process_info->eviction_fence);
>
> Good catch as well, but Felix needs to take a look at this.
This is weird. We used amdgpu_bo_fence(pd, NULL, false) here, which
would have removed an exclusive fence. But as far as I can tell we added
the fence as a shared fence in init_kfd_vm and
amdgpu_amdkfd_gpuvm_restore_process_bos. So this probably never worked
as intended.
You could try if this is really needed. Just remove the eviction fence
removal. Then enable eviction debugging with
echo Y > /sys/module/amdgpu/parameters/debug_evictions
Run some simple tests and check the kernel log to see if process
termination is causing any unexpected evictions.
Regards,
Felix
>
> Regards,
> Christian.
>
>> amdgpu_bo_unreserve(pd);
>> /* Update process info */
>
More information about the amd-gfx
mailing list