[PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2

Mon Sep 9 17:15:41 UTC 2019

Am 09.09.19 um 15:58 schrieb Yang, Philip:
>
> On 2019-09-09 8:03 a.m., Christian König wrote:
>> Am 04.09.19 um 22:12 schrieb Yang, Philip:
>>> This series looks nice and clear for me, two questions embedded below.
>>>
>>> Are we going to use dedicated sdma page queue for direct VM update path
>>> during a fault?
>>>
>>> Thanks,
>>> Philip
>>>
>>> On 2019-09-04 11:02 a.m., Christian König wrote:
>>>> Next step towards HMM support. For now just silence the retry fault and
>>>> optionally redirect the request to the dummy page.
>>>>
>>>> v2: make sure the VM is not destroyed while we handle the fault.
>>>>
>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>> ---
>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74
>>>> ++++++++++++++++++++++++++
>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>>>>     drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>>>>     3 files changed, 80 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> index 951608fc1925..410d89966a66 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm
>>>> *vm)
>>>>             }
>>>>         }
>>>>     }
>>>> +
>>>> +/**
>>>> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
>>>> + * @adev: amdgpu device pointer
>>>> + * @pasid: PASID of the VM
>>>> + * @addr: Address of the fault
>>>> + *
>>>> + * Try to gracefully handle a VM fault. Return true if the fault was
>>>> handled and
>>>> + * shouldn't be reported any more.
>>>> + */
>>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int
>>>> pasid,
>>>> +                uint64_t addr)
>>>> +{
>>>> +    struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
>>>> +    struct amdgpu_bo *root;
>>>> +    uint64_t value, flags;
>>>> +    struct amdgpu_vm *vm;
>>>> +    long r;
>>>> +
>>>> +    if (!ring->sched.ready)
>>>> +        return false;
>>>> +
>>>> +    spin_lock(&adev->vm_manager.pasid_lock);
>>>> +    vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>>> +    if (vm)
>>>> +        root = amdgpu_bo_ref(vm->root.base.bo);
>>>> +    else
>>>> +        root = NULL;
>>>> +    spin_unlock(&adev->vm_manager.pasid_lock);
>>>> +
>>>> +    if (!root)
>>>> +        return false;
>>>> +
>>>> +    r = amdgpu_bo_reserve(root, true);
>>>> +    if (r)
>>>> +        goto error_unref;
>>>> +
>>>> +    spin_lock(&adev->vm_manager.pasid_lock);
>>>> +    vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>>> +    spin_unlock(&adev->vm_manager.pasid_lock);
>>>> +
>>> Here get vm from pasid second time, and check if PD bo is changed, is
>>> this to handle vm fault race with vm destory?
>> Yes, exactly.
>>
>>>> +    if (!vm || vm->root.base.bo != root)
>>>> +        goto error_unlock;
>>>> +
>>>> +    addr /= AMDGPU_GPU_PAGE_SIZE;
>>>> +    flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>>>> +        AMDGPU_PTE_SYSTEM;
>>>> +
>>>> +    if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>>>> +        /* Redirect the access to the dummy page */
>>>> +        value = adev->dummy_page_addr;
>>>> +        flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>>>> +            AMDGPU_PTE_WRITEABLE;
>>>> +    } else {
>>>> +        value = 0;
>>>> +    }
>>>> +
>>>> +    r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr
>>>> + 1,
>>>> +                    flags, value, NULL, NULL);
>>>> +    if (r)
>>>> +        goto error_unlock;
>>>> +
>>> After fault address redirect to dummy page, will the fault recover and
>>> retry continue to execute?
>> Yes, the read/write operation will just retry and use the value from the
>> dummy page instead.
>>
>>> Is this dangerous to update PTE to use system
>>> memory address 0?
>> What are you talking about? The dummy page is a page allocate by TTM
>> where we redirect faulty accesses to.
>>
> For amdgpu_vm_fault_stop equals to AMDGPU_VM_FAULT_STOP_FIRST/ALWAYS
> case, value is 0, this will redirect to system memory 0. Maybe redirect
> is only needed for AMDGPU_VM_FAULT_STOP_NEVER?

The value 0 doesn't redirect to system memory, it results in a silence 
retry when neither the R nor the W bit is set in a PTE.

Regards,
Christian.

>
> Regards,
> Philip
>
>> Regards,
>> Christian.
>>
>>>> +    r = amdgpu_vm_update_pdes(adev, vm, true);
>>>> +
>>>> +error_unlock:
>>>> +    amdgpu_bo_unreserve(root);
>>>> +    if (r < 0)
>>>> +        DRM_ERROR("Can't handle page fault (%ld)\n", r);
>>>> +
>>>> +error_unref:
>>>> +    amdgpu_bo_unref(&root);
>>>> +
>>>> +    return false;
>>>> +}
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> index 0a97dc839f3b..4dbbe1b6b413 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct
>>>> amdgpu_device *adev);
>>>>     void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned
>>>> int pasid,
>>>>                      struct amdgpu_task_info *task_info);
>>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int
>>>> pasid,
>>>> +                uint64_t addr);
>>>>     void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>>> index 9d15679df6e0..15a1ce51befa 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>>> @@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct
>>>> amdgpu_device *adev,
>>>>         }
>>>>         /* If it's the first fault for this address, process it
>>>> normally */
>>>> +    if (retry_fault && !in_interrupt() &&
>>>> +        amdgpu_vm_handle_fault(adev, entry->pasid, addr))
>>>> +        return 1; /* This also prevents sending it to KFD */
>>>> +
>>>>         if (!amdgpu_sriov_vf(adev)) {
>>>>             /*
>>>>              * Issue a dummy read to wait for the status register to
>>>>