[PATCH 05/12] drm/amdgpu: Send no-retry XNACK for all fault types

Felix Kuehling felix.kuehling at amd.com
Wed Jul 12 05:40:42 UTC 2017


Any comments?

I believe this is a nice stability improvement. In case of VM faults
they don't take down the whole GPU with an interrupt storm. With KFD we
can recover without a GPU reset in many cases just by unmapping the
offending process' queues.

Regards,
  Felix


On 17-07-03 05:11 PM, Felix Kuehling wrote:
> From: Jay Cornwall <Jay.Cornwall at amd.com>
>
> A subset of VM fault types currently send retry XNACK to the client.
> This causes a storm of interrupts from the VM to the host.
>
> Until the storm is throttled by other means send no-retry XNACK for
> all fault types instead. No change in behavior to the client which
> will stall indefinitely with the current configuration in any case.
> Improves system stability under GC or MMHUB faults.
>
> Signed-off-by: Jay Cornwall <Jay.Cornwall at amd.com>
> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 3 +++
>  drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 3 +++
>  2 files changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
> index a42f483..f957b18 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
> @@ -206,6 +206,9 @@ static void gfxhub_v1_0_setup_vmid_config(struct amdgpu_device *adev)
>  		tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
>  				PAGE_TABLE_BLOCK_SIZE,
>  				adev->vm_manager.block_size - 9);
> +		/* Send no-retry XNACK on fault to suppress VM fault storm. */
> +		tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
> +				    RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
>  		WREG32_SOC15_OFFSET(GC, 0, mmVM_CONTEXT1_CNTL, i, tmp);
>  		WREG32_SOC15_OFFSET(GC, 0, mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, i*2, 0);
>  		WREG32_SOC15_OFFSET(GC, 0, mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, i*2, 0);
> diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
> index 01918dc..b760018 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
> @@ -222,6 +222,9 @@ static void mmhub_v1_0_setup_vmid_config(struct amdgpu_device *adev)
>  		tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
>  				PAGE_TABLE_BLOCK_SIZE,
>  				adev->vm_manager.block_size - 9);
> +		/* Send no-retry XNACK on fault to suppress VM fault storm. */
> +		tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
> +				    RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
>  		WREG32_SOC15_OFFSET(MMHUB, 0, mmVM_CONTEXT1_CNTL, i, tmp);
>  		WREG32_SOC15_OFFSET(MMHUB, 0, mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, i*2, 0);
>  		WREG32_SOC15_OFFSET(MMHUB, 0, mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, i*2, 0);



More information about the amd-gfx mailing list