[PATCH] drm/amdgpu: Fix a NULL pointer of fence

Felix Kuehling felix.kuehling at amd.com
Thu Jul 7 15:47:22 UTC 2022


Am 2022-07-07 um 05:54 schrieb Christian König:
> Am 07.07.22 um 11:50 schrieb xinhui pan:
>> Fence is accessed by dma_resv_add_fence() now.
>> Use amdgpu_amdkfd_remove_eviction_fence instead.
>>
>> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index 0036c9e405af..1e25c400ce4f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -1558,10 +1558,10 @@ void amdgpu_amdkfd_gpuvm_destroy_cb(struct 
>> amdgpu_device *adev,
>>         if (!process_info)
>>           return;
>> -
>>       /* Release eviction fence from PD */
>>       amdgpu_bo_reserve(pd, false);
>> -    amdgpu_bo_fence(pd, NULL, false);
>> +    amdgpu_amdkfd_remove_eviction_fence(pd,
>> +                    process_info->eviction_fence);
>
> Good catch as well, but Felix needs to take a look at this.

This is weird. We used amdgpu_bo_fence(pd, NULL, false) here, which 
would have removed an exclusive fence. But as far as I can tell we added 
the fence as a shared fence in init_kfd_vm and 
amdgpu_amdkfd_gpuvm_restore_process_bos. So this probably never worked 
as intended.

You could try if this is really needed. Just remove the eviction fence 
removal. Then enable eviction debugging with

     echo Y > /sys/module/amdgpu/parameters/debug_evictions

Run some simple tests and check the kernel log to see if process 
termination is causing any unexpected evictions.

Regards,
   Felix


>
> Regards,
> Christian.
>
>>       amdgpu_bo_unreserve(pd);
>>         /* Update process info */
>


More information about the amd-gfx mailing list