[PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2

Mon Sep 9 12:07:33 UTC 2019

Am 05.09.19 um 00:47 schrieb Kuehling, Felix:
> On 2019-09-04 11:02 a.m., Christian König wrote:
>> Next step towards HMM support. For now just silence the retry fault and
>> optionally redirect the request to the dummy page.
>>
>> v2: make sure the VM is not destroyed while we handle the fault.
>>
>> Signed-off-by: Christian König <christian.koenig at amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++++++++++++++++++++++++++
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>>    drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>>    3 files changed, 80 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 951608fc1925..410d89966a66 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
>>    		}
>>    	}
>>    }
>> +
>> +/**
>> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
>> + * @adev: amdgpu device pointer
>> + * @pasid: PASID of the VM
>> + * @addr: Address of the fault
>> + *
>> + * Try to gracefully handle a VM fault. Return true if the fault was handled and
>> + * shouldn't be reported any more.
>> + */
>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> +			    uint64_t addr)
>> +{
>> +	struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
>> +	struct amdgpu_bo *root;
>> +	uint64_t value, flags;
>> +	struct amdgpu_vm *vm;
>> +	long r;
>> +
>> +	if (!ring->sched.ready)
>> +		return false;
>> +
>> +	spin_lock(&adev->vm_manager.pasid_lock);
>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>> +	if (vm)
>> +		root = amdgpu_bo_ref(vm->root.base.bo);
>> +	else
>> +		root = NULL;
>> +	spin_unlock(&adev->vm_manager.pasid_lock);
>> +
>> +	if (!root)
>> +		return false;
>> +
>> +	r = amdgpu_bo_reserve(root, true);
>> +	if (r)
>> +		goto error_unref;
>> +
>> +	spin_lock(&adev->vm_manager.pasid_lock);
>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>> +	spin_unlock(&adev->vm_manager.pasid_lock);
> I think this deserves a comment. If I understand it correctly, you're
> looking up the vm twice so that you have the VM root reservation to
> protect against user-after-free. Otherwise the vm pointer is only valid
> as long as you're holding the spin-lock.
>
>
>> +
>> +	if (!vm || vm->root.base.bo != root)
> The check of vm->root.base.bo should probably still be under the
> spin_lock. Because you're not sure yet it's the right VM, you can't rely
> on the reservation here to prevent use-after-free.

Good point, going to fix that.

>
>
>> +		goto error_unlock;
>> +
>> +	addr /= AMDGPU_GPU_PAGE_SIZE;
>> +	flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>> +		AMDGPU_PTE_SYSTEM;
>> +
>> +	if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>> +		/* Redirect the access to the dummy page */
>> +		value = adev->dummy_page_addr;
>> +		flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>> +			AMDGPU_PTE_WRITEABLE;
>> +	} else {
>> +		value = 0;
>> +	}
>> +
>> +	r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
>> +					flags, value, NULL, NULL);
>> +	if (r)
>> +		goto error_unlock;
>> +
>> +	r = amdgpu_vm_update_pdes(adev, vm, true);
>> +
>> +error_unlock:
>> +	amdgpu_bo_unreserve(root);
>> +	if (r < 0)
>> +		DRM_ERROR("Can't handle page fault (%ld)\n", r);
>> +
>> +error_unref:
>> +	amdgpu_bo_unref(&root);
>> +
>> +	return false;
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> index 0a97dc839f3b..4dbbe1b6b413 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct amdgpu_device *adev);
>>    
>>    void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
>>    			     struct amdgpu_task_info *task_info);
>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> +			    uint64_t addr);
>>    
>>    void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
>>    
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> index 9d15679df6e0..15a1ce51befa 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> @@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
>>    	}
>>    
>>    	/* If it's the first fault for this address, process it normally */
>> +	if (retry_fault && !in_interrupt() &&
>> +	    amdgpu_vm_handle_fault(adev, entry->pasid, addr))
>> +		return 1; /* This also prevents sending it to KFD */
> The !in_interrupt() is meant to only do this on the rerouted interrupt
> ring that's handled by a worker function?

Yes, exactly. But I plan to add a workaround where the CPU redirects the 
fault to the other ring buffer for firmware versions which doesn't have 
that.

Adds quite a bunch of overhead on Vega10, because of the fault storm but 
should work fine on Vega20.

Key point is that we already released firmware without the redirection, 
but it's still better to have that than to run into the storm.

> Looks like amdgpu_vm_handle_fault never returns true for now. So we'll
> never get to the "return 1" here.

Oh, yes that actually belongs into a follow up patch.

Thanks,
Christian.

>
> Regards,
>     Felix
>
>
>> +
>>    	if (!amdgpu_sriov_vf(adev)) {
>>    		/*
>>    		 * Issue a dummy read to wait for the status register to