[PATCH 5/6] drm/amdkfd: enable subsequent retry fault

Wed Apr 21 01:22:16 UTC 2021

Am 2021-04-20 um 4:21 p.m. schrieb Philip Yang:
> After draining the stale retry fault, or failed to validate the range
> to recover, have to remove the fault address from fault filter ring, to
> be able to handle subsequent retry interrupt on same address. Otherwise
> the retry fault will not be processed to recover until timeout passed.
>
> Signed-off-by: Philip Yang <Philip.Yang at amd.com>

Patches 1-3 and patch 5 are

Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>

I didn't see a patch 6. Was the email lost or not send intentionally?

> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 45dd055118eb..d90e0cb6e573 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -2262,8 +2262,10 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
>  
>  	mutex_lock(&prange->migrate_mutex);
>  
> -	if (svm_range_skip_recover(prange))
> +	if (svm_range_skip_recover(prange)) {
> +		amdgpu_gmc_filter_faults_remove(adev, addr, pasid);
>  		goto out_unlock_range;
> +	}
>  
>  	timestamp = ktime_to_us(ktime_get()) - prange->validate_timestamp;
>  	/* skip duplicate vm fault on different pages of same range */
> @@ -2325,6 +2327,7 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
>  
>  	if (r == -EAGAIN) {
>  		pr_debug("recover vm fault later\n");
> +		amdgpu_gmc_filter_faults_remove(adev, addr, pasid);
>  		r = 0;
>  	}
>  	return r;