[PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2

Mon Sep 9 12:03:23 UTC 2019

Am 04.09.19 um 22:12 schrieb Yang, Philip:
> This series looks nice and clear for me, two questions embedded below.
>
> Are we going to use dedicated sdma page queue for direct VM update path
> during a fault?
>
> Thanks,
> Philip
>
> On 2019-09-04 11:02 a.m., Christian König wrote:
>> Next step towards HMM support. For now just silence the retry fault and
>> optionally redirect the request to the dummy page.
>>
>> v2: make sure the VM is not destroyed while we handle the fault.
>>
>> Signed-off-by: Christian König <christian.koenig at amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++++++++++++++++++++++++++
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>>    drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>>    3 files changed, 80 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 951608fc1925..410d89966a66 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
>>    		}
>>    	}
>>    }
>> +
>> +/**
>> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
>> + * @adev: amdgpu device pointer
>> + * @pasid: PASID of the VM
>> + * @addr: Address of the fault
>> + *
>> + * Try to gracefully handle a VM fault. Return true if the fault was handled and
>> + * shouldn't be reported any more.
>> + */
>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> +			    uint64_t addr)
>> +{
>> +	struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
>> +	struct amdgpu_bo *root;
>> +	uint64_t value, flags;
>> +	struct amdgpu_vm *vm;
>> +	long r;
>> +
>> +	if (!ring->sched.ready)
>> +		return false;
>> +
>> +	spin_lock(&adev->vm_manager.pasid_lock);
>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>> +	if (vm)
>> +		root = amdgpu_bo_ref(vm->root.base.bo);
>> +	else
>> +		root = NULL;
>> +	spin_unlock(&adev->vm_manager.pasid_lock);
>> +
>> +	if (!root)
>> +		return false;
>> +
>> +	r = amdgpu_bo_reserve(root, true);
>> +	if (r)
>> +		goto error_unref;
>> +
>> +	spin_lock(&adev->vm_manager.pasid_lock);
>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>> +	spin_unlock(&adev->vm_manager.pasid_lock);
>> +
> Here get vm from pasid second time, and check if PD bo is changed, is
> this to handle vm fault race with vm destory?

Yes, exactly.

>
>> +	if (!vm || vm->root.base.bo != root)
>> +		goto error_unlock;
>> +
>> +	addr /= AMDGPU_GPU_PAGE_SIZE;
>> +	flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>> +		AMDGPU_PTE_SYSTEM;
>> +
>> +	if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>> +		/* Redirect the access to the dummy page */
>> +		value = adev->dummy_page_addr;
>> +		flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>> +			AMDGPU_PTE_WRITEABLE;
>> +	} else {
>> +		value = 0;
>> +	}
>> +
>> +	r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
>> +					flags, value, NULL, NULL);
>> +	if (r)
>> +		goto error_unlock;
>> +
> After fault address redirect to dummy page, will the fault recover and
> retry continue to execute?

Yes, the read/write operation will just retry and use the value from the 
dummy page instead.

> Is this dangerous to update PTE to use system
> memory address 0?

What are you talking about? The dummy page is a page allocate by TTM 
where we redirect faulty accesses to.

Regards,
Christian.

>
>> +	r = amdgpu_vm_update_pdes(adev, vm, true);
>> +
>> +error_unlock:
>> +	amdgpu_bo_unreserve(root);
>> +	if (r < 0)
>> +		DRM_ERROR("Can't handle page fault (%ld)\n", r);
>> +
>> +error_unref:
>> +	amdgpu_bo_unref(&root);
>> +
>> +	return false;
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> index 0a97dc839f3b..4dbbe1b6b413 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct amdgpu_device *adev);
>>    
>>    void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
>>    			     struct amdgpu_task_info *task_info);
>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> +			    uint64_t addr);
>>    
>>    void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
>>    
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> index 9d15679df6e0..15a1ce51befa 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> @@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
>>    	}
>>    
>>    	/* If it's the first fault for this address, process it normally */
>> +	if (retry_fault && !in_interrupt() &&
>> +	    amdgpu_vm_handle_fault(adev, entry->pasid, addr))
>> +		return 1; /* This also prevents sending it to KFD */
>> +
>>    	if (!amdgpu_sriov_vf(adev)) {
>>    		/*
>>    		 * Issue a dummy read to wait for the status register to
>>