[PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only after map count goes to zero

Felix Kuehling felix.kuehling at amd.com
Wed Jun 8 20:44:03 UTC 2022


On 2022-06-08 16:03, Errabolu, Ramesh wrote:
> [AMD Official Use Only - General]
>
> My response is inline.
>
> Regards,
> Ramesh
>
> -----Original Message-----
> From: Kuehling, Felix <Felix.Kuehling at amd.com>
> Sent: Thursday, June 9, 2022 1:10 AM
> To: amd-gfx at lists.freedesktop.org; Errabolu, Ramesh <Ramesh.Errabolu at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: Unpin MMIO and DOORBELL BOs only after map count goes to zero
>
>
> On 2022-06-08 07:51, Ramesh Errabolu wrote:
>> In existing code MMIO and DOORBELL BOs are unpinned without ensuring
>> the condition that their map count has reached zero. Unpinning without
>> checking this constraint could lead to an error while BO is being
>> freed. The patch fixes this issue.
>>
>> Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu at amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 15 +++++++--------
>>    1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index a1de900ba677..e5dc94b745b1 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -1832,13 +1832,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>    
>>    	mutex_lock(&mem->lock);
>>    
>> -	/* Unpin MMIO/DOORBELL BO's that were pinned during allocation */
>> -	if (mem->alloc_flags &
>> -	    (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
>> -	     KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
>> -		amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
>> -	}
>> -
>>    	mapped_to_gpu_memory = mem->mapped_to_gpu_memory;
>>    	is_imported = mem->is_imported;
>>    	mutex_unlock(&mem->lock);
>> @@ -1855,7 +1848,7 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>    	/* Make sure restore workers don't access the BO any more */
>>    	bo_list_entry = &mem->validate_list;
>>    	mutex_lock(&process_info->lock);
>> -	list_del(&bo_list_entry->head);
>> +	list_del_init(&bo_list_entry->head);
> Is this an unrelated fix? What is this needed for? I vaguely remember discussing this before, but can't remember the reason.
>
> Ramesh: This fix is unrelated to P2P work. I brought this issue to attention while working on IOMMU support on DKMS branch. Basically a user could call free() before the map count goes to zero. The patch is trying fix that.

I get that, but I couldn't remember why I suggested list_del_init here. 
It has nothing to do with unpinning of BOs.

Now I recall that it had something to do with restarting the ioctl after 
it was interrupted by a signal. reserve_bo_and_cond_vms can fail with 
-ERESTARTSYS. In that case the ioctl is reentered. We need to make sure 
it doesn't crash the second time around. list_del will remove 
bo_list_entry from the list but leave the pointers dangling. The second 
time around it will probably cause corruption or an oops. Using 
list_del_init avoids that by initializing the prev and next pointers to 
NULL.

See one more little fix below.


>
> Regards,
>     Felix
>
>
>>    	mutex_unlock(&process_info->lock);
>>    
>>    	/* No more MMU notifiers */
>> @@ -1880,6 +1873,12 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>    
>>    	ret = unreserve_bo_and_vms(&ctx, false, false);

This unreserve_bo_and_vms call cannot fail because the wait parameter is 
false. If it did fail, the error handling would be broken. I'd add a 
WARN_ONCE to make that assumption explicit, and change the return at the 
end of this function to return 0. Basically, if we got this far, we are 
not turning back, and we should return success.

You could update the commit headline to be more general. Something like: 
Fix error handling in amdgpu_amdkfd_gpuvm_free_memory_of_gpu.

Regards,
   Felix


>>    
>> +	/* Unpin MMIO/DOORBELL BO's that were pinned during allocation */
>> +	if (mem->alloc_flags &
>> +	    (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
>> +	     KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP))
>> +		amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
>> +
>>    	/* Free the sync object */
>>    	amdgpu_sync_free(&mem->sync);
>>    


More information about the amd-gfx mailing list