[PATCH] drm/amdgpu: stop allocating PDs/PTs with the eviction lock held

Felix Kuehling felix.kuehling at amd.com
Thu Feb 27 18:43:17 UTC 2020


On 2020-02-27 9:28, Christian König wrote:
> Hi Felix,
>
> so coming back to this after two weeks of distraction.
>
> Am 14.02.20 um 22:12 schrieb Felix Kuehling:
>> Now you allow eviction of page tables while you allocate page tables. 
>> Isn't the whole point of the eviction lock to prevent page table 
>> evictions while manipulating page tables?
>>
>> Or does this only apply to PTE invalidations which never allocated 
>> memory? Is that the only case that doesn't reserve page tables?
>
> Yes, exactly. We essentially have to distinct two cases here:
>
> 1. We are validating PTEs and so eventually need to allocate page tables.
>     For this the root PD is reserved and we actually know that the VM 
> can't be evicted.
>     But we still need to hold the lock while pushing the actual 
> updates the hardware to make sure we don't mess up the data structures 
> with concurrent invalidations.
>
> 2. We are invalidating PTEs.
>     Here we might or might not have the root PD reserved and so need 
> to make sure that nobody is evicting page tables while we are 
> invalidating.
>
> Quite a complicated dance, but of hand I also don't see much other 
> choice.

Sounds good. Maybe we should have an assert that the root PD is reserved 
in amdgpu_vm_alloc_pts. With that modification the patch is

Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>


>
> Regards,
> Christian.
>
>>
>> Regards,
>>   Felix
>>
>> On 2020-02-12 10:14 a.m., Christian König wrote:
>>> We need to make sure to not allocate PDs/PTs while holding
>>> the eviction lock or otherwise we will run into lock inversion
>>> in the MM as soon as we enable the MMU notifier.
>>>
>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 31 
>>> +++++++++++++++++++++-----
>>>   1 file changed, 25 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 77c400675b79..e7ab0c1e2793 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -897,27 +897,42 @@ static int amdgpu_vm_alloc_pts(struct 
>>> amdgpu_device *adev,
>>>       struct amdgpu_vm_pt *entry = cursor->entry;
>>>       struct amdgpu_bo_param bp;
>>>       struct amdgpu_bo *pt;
>>> +    bool need_entries;
>>>       int r;
>>>   -    if (cursor->level < AMDGPU_VM_PTB && !entry->entries) {
>>> +    need_entries = cursor->level < AMDGPU_VM_PTB && !entry->entries;
>>> +    if (!need_entries && entry->base.bo)
>>> +        return 0;
>>> +
>>> +    /* We need to make sure that we don't allocate PDs/PTs while 
>>> holding the
>>> +     * eviction lock or we run into lock recursion in the MM.
>>> +     */
>>> +    amdgpu_vm_eviction_unlock(vm);
>>> +
>>> +    if (need_entries) {
>>>           unsigned num_entries;
>>>             num_entries = amdgpu_vm_num_entries(adev, cursor->level);
>>>           entry->entries = kvmalloc_array(num_entries,
>>>                           sizeof(*entry->entries),
>>>                           GFP_KERNEL | __GFP_ZERO);
>>> -        if (!entry->entries)
>>> -            return -ENOMEM;
>>> +        if (!entry->entries) {
>>> +            r = -ENOMEM;
>>> +            goto error_lock;
>>> +        }
>>>       }
>>>   -    if (entry->base.bo)
>>> -        return 0;
>>> +    if (entry->base.bo) {
>>> +        r = 0;
>>> +        goto error_lock;
>>> +    }
>>>         amdgpu_vm_bo_param(adev, vm, cursor->level, direct, &bp);
>>>         r = amdgpu_bo_create(adev, &bp, &pt);
>>> +    amdgpu_vm_eviction_lock(vm);
>>>       if (r)
>>> -        return r;
>>> +        goto error_free_pt;
>>>         /* Keep a reference to the root directory to avoid
>>>        * freeing them up in the wrong order.
>>> @@ -936,6 +951,10 @@ static int amdgpu_vm_alloc_pts(struct 
>>> amdgpu_device *adev,
>>>       amdgpu_bo_unref(&pt);
>>>       entry->base.bo = NULL;
>>>       return r;
>>> +
>>> +error_lock:
>>> +    amdgpu_vm_eviction_lock(vm);
>>> +    return r;
>>>   }
>>>     /**
>


More information about the amd-gfx mailing list