[PATCH 05/12] drm/amdgpu: Send no-retry XNACK for all fault types

Bridgman, John John.Bridgman at amd.com
Wed Jul 12 16:13:10 UTC 2017


Agreed... I thought we had already made this change but if not then... 

Reviewed-by: John Bridgman <John.Bridgman at amd.com>

>-----Original Message-----
>From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf
>Of Felix Kuehling
>Sent: Wednesday, July 12, 2017 1:41 AM
>To: amd-gfx at lists.freedesktop.org
>Subject: Re: [PATCH 05/12] drm/amdgpu: Send no-retry XNACK for all fault
>types
>
>Any comments?
>
>I believe this is a nice stability improvement. In case of VM faults they don't
>take down the whole GPU with an interrupt storm. With KFD we can recover
>without a GPU reset in many cases just by unmapping the offending process'
>queues.
>
>Regards,
>  Felix
>
>
>On 17-07-03 05:11 PM, Felix Kuehling wrote:
>> From: Jay Cornwall <Jay.Cornwall at amd.com>
>>
>> A subset of VM fault types currently send retry XNACK to the client.
>> This causes a storm of interrupts from the VM to the host.
>>
>> Until the storm is throttled by other means send no-retry XNACK for
>> all fault types instead. No change in behavior to the client which
>> will stall indefinitely with the current configuration in any case.
>> Improves system stability under GC or MMHUB faults.
>>
>> Signed-off-by: Jay Cornwall <Jay.Cornwall at amd.com>
>> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 3 +++
>> drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 3 +++
>>  2 files changed, 6 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
>> index a42f483..f957b18 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
>> @@ -206,6 +206,9 @@ static void gfxhub_v1_0_setup_vmid_config(struct
>amdgpu_device *adev)
>>  		tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
>>  				PAGE_TABLE_BLOCK_SIZE,
>>  				adev->vm_manager.block_size - 9);
>> +		/* Send no-retry XNACK on fault to suppress VM fault storm.
>*/
>> +		tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
>> +
>RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
>>  		WREG32_SOC15_OFFSET(GC, 0, mmVM_CONTEXT1_CNTL, i,
>tmp);
>>  		WREG32_SOC15_OFFSET(GC, 0,
>mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, i*2, 0);
>>  		WREG32_SOC15_OFFSET(GC, 0,
>> mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, i*2, 0); diff --git
>> a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
>> b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
>> index 01918dc..b760018 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
>> @@ -222,6 +222,9 @@ static void mmhub_v1_0_setup_vmid_config(struct
>amdgpu_device *adev)
>>  		tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
>>  				PAGE_TABLE_BLOCK_SIZE,
>>  				adev->vm_manager.block_size - 9);
>> +		/* Send no-retry XNACK on fault to suppress VM fault storm.
>*/
>> +		tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
>> +
>RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
>>  		WREG32_SOC15_OFFSET(MMHUB, 0,
>mmVM_CONTEXT1_CNTL, i, tmp);
>>  		WREG32_SOC15_OFFSET(MMHUB, 0,
>mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, i*2, 0);
>>  		WREG32_SOC15_OFFSET(MMHUB, 0,
>> mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, i*2, 0);
>
>_______________________________________________
>amd-gfx mailing list
>amd-gfx at lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx


More information about the amd-gfx mailing list