[PATCH] drm/amdkfd: Fix lock dependency warning

Tue Jan 2 16:46:14 UTC 2024

On 2023-12-28 18:11, Philip Yang wrote:
>
>
> On 2023-12-21 15:40, Felix Kuehling wrote:
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 6.5.0-kfd-fkuehlin #276 Not tainted
>> ------------------------------------------------------
>> kworker/8:2/2676 is trying to acquire lock:
>> ffff9435aae95c88 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}, at: __flush_work+0x52/0x550
>>
>> but task is already holding lock:
>> ffff9435cd8e1720 (&svms->lock){+.+.}-{3:3}, at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu]
>>
>> which lock already depends on the new lock.
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #2 (&svms->lock){+.+.}-{3:3}:
>>         __mutex_lock+0x97/0xd30
>>         kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu]
>>         kfd_ioctl+0x1b2/0x5d0 [amdgpu]
>>         __x64_sys_ioctl+0x86/0xc0
>>         do_syscall_64+0x39/0x80
>>         entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>
>> -> #1 (&mm->mmap_lock){++++}-{3:3}:
>>         down_read+0x42/0x160
>>         svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu]
>>         process_one_work+0x27a/0x540
>>         worker_thread+0x53/0x3e0
>>         kthread+0xeb/0x120
>>         ret_from_fork+0x31/0x50
>>         ret_from_fork_asm+0x11/0x20
>>
>> -> #0 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}:
>>         __lock_acquire+0x1426/0x2200
>>         lock_acquire+0xc1/0x2b0
>>         __flush_work+0x80/0x550
>>         __cancel_work_timer+0x109/0x190
>>         svm_range_bo_release+0xdc/0x1c0 [amdgpu]
>>         svm_range_free+0x175/0x180 [amdgpu]
>>         svm_range_deferred_list_work+0x15d/0x340 [amdgpu]
>>         process_one_work+0x27a/0x540
>>         worker_thread+0x53/0x3e0
>>         kthread+0xeb/0x120
>>         ret_from_fork+0x31/0x50
>>         ret_from_fork_asm+0x11/0x20
>>
>> other info that might help us debug this:
>>
>> Chain exists of:
>>    (work_completion)(&svm_bo->eviction_work) --> &mm->mmap_lock --> &svms->lock
>>
>>   Possible unsafe locking scenario:
>>
>>         CPU0                    CPU1
>>         ----                    ----
>>    lock(&svms->lock);
>>                                 lock(&mm->mmap_lock);
>>                                 lock(&svms->lock);
>>    lock((work_completion)(&svm_bo->eviction_work));
>>
>> I believe this cannot really lead to a deadlock in practice, because
>> svm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO
>> refcount is non-0. That means it's impossible that svm_range_bo_release
>> is running concurrently. However, there is no good way to annotate this.
>>
>> To avoid the problem, take a BO reference in
>> svm_range_schedule_evict_svm_bo instead of in the worker. That way it's
>> impossible for a BO to get freed while eviction work is pending and the
>> cancel_work_sync call in svm_range_bo_release can be eliminated.
>>
>> Signed-off-by: Felix Kuehling<Felix.Kuehling at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 13 ++++---------
>>   1 file changed, 4 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index afd98aace065..7683c96f0cbd 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -404,14 +404,9 @@ static void svm_range_bo_release(struct kref *kref)
>>   		spin_lock(&svm_bo->list_lock);
>>   	}
>>   	spin_unlock(&svm_bo->list_lock);
>> -	if (!dma_fence_is_signaled(&svm_bo->eviction_fence->base)) {
>> -		/* We're not in the eviction worker.
>> -		 * Signal the fence and synchronize with any
>> -		 * pending eviction work.
>> -		 */
>> +	if (!dma_fence_is_signaled(&svm_bo->eviction_fence->base))
>> +		/* We're not in the eviction worker. Signal the fence. */
>>   		dma_fence_signal(&svm_bo->eviction_fence->base);
>> -		cancel_work_sync(&svm_bo->eviction_work);
>> -	}
>>   	dma_fence_put(&svm_bo->eviction_fence->base);
>>   	amdgpu_bo_unref(&svm_bo->bo);
>>   	kfree(svm_bo);
>> @@ -3444,6 +3439,8 @@ int svm_range_schedule_evict_svm_bo(struct amdgpu_amdkfd_fence *fence)
>>   		return 0;
>>   
>>   	if (fence->svm_bo) {
>> +		/* Reference is dropped in svm_range_evict_svm_bo_worker */
>> +		kref_get(&fence->svm_bo->kref);
>
> I think there maybe racing condition if bo is already releasing,  get 
> ref unless zero
>
If that's a concern, we should probably make sure that the fence->svm_bo 
pointer holds a reference itself. We shouldn't have eviction fences with 
dangling svm_bo pointers. And we should not let a BO refount go to 0 
while there exists an unsignaled eviction fence pointing to it. I'll 
look into that.

> 	/* if svm_bo is getting released, eviction fence will be signaled */
> 	if (!svm_bo_ref_unless_zero(svm_bo))
> 		return 0;
>
> Another solution is to call svm_range_bo_unref_async from svm_range_vram_node_free.

That's not ideal either. The eviction worker is already a worker thread. 
Then we'd need to schedule another worker to actually release the 
memory, adding more latency and scheduling overhead.

Regards,
   Felix

>
> Regards,
> Philip
>>   		WRITE_ONCE(fence->svm_bo->evicting, 1);
>>   		schedule_work(&fence->svm_bo->eviction_work);
>>   	}
>> @@ -3458,8 +3455,6 @@ static void svm_range_evict_svm_bo_worker(struct work_struct *work)
>>   	int r = 0;
>>   
>>   	svm_bo = container_of(work, struct svm_range_bo, eviction_work);
>> -	if (!svm_bo_ref_unless_zero(svm_bo))
>> -		return; /* svm_bo was freed while eviction was pending */
>>   
>>   	if (mmget_not_zero(svm_bo->eviction_fence->mm)) {
>>   		mm = svm_bo->eviction_fence->mm;