[PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2

Mon Sep 9 17:14:01 UTC 2019

Well first of all we are not in interrupt context here, this is handled 
by a work item or otherwise we couldn't do all the locking.

But even in interrupt context another CPU can easily destroy the VM when 
we just handle a stale fault or the process was killed.

So this extra double checking is strictly necessary.

Regards,
Christian.

Am 09.09.19 um 16:08 schrieb Zeng, Oak:
> Is looking up vm twice necessary? I think we are in interrupt context, is it possible that the user space application can be switched in between? My understanding is, if user space application is can't kick in during interrupt handling, application shouldn't have chance to exit (then their vm being destroyed).
>
> Regards,
> Oak
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Christian König
> Sent: Monday, September 9, 2019 8:08 AM
> To: Kuehling, Felix <Felix.Kuehling at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
>
> Am 05.09.19 um 00:47 schrieb Kuehling, Felix:
>> On 2019-09-04 11:02 a.m., Christian König wrote:
>>> Next step towards HMM support. For now just silence the retry fault
>>> and optionally redirect the request to the dummy page.
>>>
>>> v2: make sure the VM is not destroyed while we handle the fault.
>>>
>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++++++++++++++++++++++++++
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>>>     drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>>>     3 files changed, 80 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 951608fc1925..410d89966a66 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
>>>     		}
>>>     	}
>>>     }
>>> +
>>> +/**
>>> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
>>> + * @adev: amdgpu device pointer
>>> + * @pasid: PASID of the VM
>>> + * @addr: Address of the fault
>>> + *
>>> + * Try to gracefully handle a VM fault. Return true if the fault was
>>> +handled and
>>> + * shouldn't be reported any more.
>>> + */
>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>>> +			    uint64_t addr)
>>> +{
>>> +	struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
>>> +	struct amdgpu_bo *root;
>>> +	uint64_t value, flags;
>>> +	struct amdgpu_vm *vm;
>>> +	long r;
>>> +
>>> +	if (!ring->sched.ready)
>>> +		return false;
>>> +
>>> +	spin_lock(&adev->vm_manager.pasid_lock);
>>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>> +	if (vm)
>>> +		root = amdgpu_bo_ref(vm->root.base.bo);
>>> +	else
>>> +		root = NULL;
>>> +	spin_unlock(&adev->vm_manager.pasid_lock);
>>> +
>>> +	if (!root)
>>> +		return false;
>>> +
>>> +	r = amdgpu_bo_reserve(root, true);
>>> +	if (r)
>>> +		goto error_unref;
>>> +
>>> +	spin_lock(&adev->vm_manager.pasid_lock);
>>> +	vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>> +	spin_unlock(&adev->vm_manager.pasid_lock);
>> I think this deserves a comment. If I understand it correctly, you're
>> looking up the vm twice so that you have the VM root reservation to
>> protect against user-after-free. Otherwise the vm pointer is only
>> valid as long as you're holding the spin-lock.
>>
>>
>>> +
>>> +	if (!vm || vm->root.base.bo != root)
>> The check of vm->root.base.bo should probably still be under the
>> spin_lock. Because you're not sure yet it's the right VM, you can't
>> rely on the reservation here to prevent use-after-free.
> Good point, going to fix that.
>
>>
>>> +		goto error_unlock;
>>> +
>>> +	addr /= AMDGPU_GPU_PAGE_SIZE;
>>> +	flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>>> +		AMDGPU_PTE_SYSTEM;
>>> +
>>> +	if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>>> +		/* Redirect the access to the dummy page */
>>> +		value = adev->dummy_page_addr;
>>> +		flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>>> +			AMDGPU_PTE_WRITEABLE;
>>> +	} else {
>>> +		value = 0;
>>> +	}
>>> +
>>> +	r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
>>> +					flags, value, NULL, NULL);
>>> +	if (r)
>>> +		goto error_unlock;
>>> +
>>> +	r = amdgpu_vm_update_pdes(adev, vm, true);
>>> +
>>> +error_unlock:
>>> +	amdgpu_bo_unreserve(root);
>>> +	if (r < 0)
>>> +		DRM_ERROR("Can't handle page fault (%ld)\n", r);
>>> +
>>> +error_unref:
>>> +	amdgpu_bo_unref(&root);
>>> +
>>> +	return false;
>>> +}
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> index 0a97dc839f3b..4dbbe1b6b413 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct
>>> amdgpu_device *adev);
>>>     
>>>     void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
>>>     			     struct amdgpu_task_info *task_info);
>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>>> +			    uint64_t addr);
>>>     
>>>     void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
>>>     
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> index 9d15679df6e0..15a1ce51befa 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> @@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
>>>     	}
>>>     
>>>     	/* If it's the first fault for this address, process it normally
>>> */
>>> +	if (retry_fault && !in_interrupt() &&
>>> +	    amdgpu_vm_handle_fault(adev, entry->pasid, addr))
>>> +		return 1; /* This also prevents sending it to KFD */
>> The !in_interrupt() is meant to only do this on the rerouted interrupt
>> ring that's handled by a worker function?
> Yes, exactly. But I plan to add a workaround where the CPU redirects the fault to the other ring buffer for firmware versions which doesn't have that.
>
> Adds quite a bunch of overhead on Vega10, because of the fault storm but should work fine on Vega20.
>
> Key point is that we already released firmware without the redirection, but it's still better to have that than to run into the storm.
>
>> Looks like amdgpu_vm_handle_fault never returns true for now. So we'll
>> never get to the "return 1" here.
> Oh, yes that actually belongs into a follow up patch.
>
> Thanks,
> Christian.
>
>> Regards,
>>      Felix
>>
>>
>>> +
>>>     	if (!amdgpu_sriov_vf(adev)) {
>>>     		/*
>>>     		 * Issue a dummy read to wait for the status register to
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx