[PATCH 05/12] drm/amdgpu: Send no-retry XNACK for all fault types

Christian König deathsimple at vodafone.de
Wed Jul 12 16:49:32 UTC 2017


Am 12.07.2017 um 18:15 schrieb Bridgman, John:
>> -----Original Message-----
>> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf
>> Of Alex Deucher
>> Sent: Wednesday, July 12, 2017 11:59 AM
>> To: Kuehling, Felix
>> Cc: amd-gfx list
>> Subject: Re: [PATCH 05/12] drm/amdgpu: Send no-retry XNACK for all fault
>> types
>>
>> On Wed, Jul 12, 2017 at 1:40 AM, Felix Kuehling <felix.kuehling at amd.com>
>> wrote:
>>> Any comments?
>>>
>>> I believe this is a nice stability improvement. In case of VM faults
>>> they don't take down the whole GPU with an interrupt storm. With KFD
>>> we can recover without a GPU reset in many cases just by unmapping the
>>> offending process' queues.
>> Will this cause any problems with enabling recoverable page faults later?  If
>> not,
>> Acked-by: Alex Deucher <alexander.deucher at amd.com>
> We will need to back this out in order to enable recoverable page faults later, but probably still worth doing in the short term IMO.

Yeah, agree. Especially avoiding the interrupt ring overflow sounds like 
a good idea to me.

Patch is Acked-by: Christian König <christian.koenig at amd.com> as well.

Christian.

>
>>> Regards,
>>>    Felix
>>>
>>>
>>> On 17-07-03 05:11 PM, Felix Kuehling wrote:
>>>> From: Jay Cornwall <Jay.Cornwall at amd.com>
>>>>
>>>> A subset of VM fault types currently send retry XNACK to the client.
>>>> This causes a storm of interrupts from the VM to the host.
>>>>
>>>> Until the storm is throttled by other means send no-retry XNACK for
>>>> all fault types instead. No change in behavior to the client which
>>>> will stall indefinitely with the current configuration in any case.
>>>> Improves system stability under GC or MMHUB faults.
>>>>
>>>> Signed-off-by: Jay Cornwall <Jay.Cornwall at amd.com>
>>>> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 3 +++
>>>> drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 3 +++
>>>>   2 files changed, 6 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
>>>> b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
>>>> index a42f483..f957b18 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
>>>> @@ -206,6 +206,9 @@ static void gfxhub_v1_0_setup_vmid_config(struct
>> amdgpu_device *adev)
>>>>                tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
>>>>                                PAGE_TABLE_BLOCK_SIZE,
>>>>                                adev->vm_manager.block_size - 9);
>>>> +             /* Send no-retry XNACK on fault to suppress VM fault storm. */
>>>> +             tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
>>>> +
>>>> + RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
>>>>                WREG32_SOC15_OFFSET(GC, 0, mmVM_CONTEXT1_CNTL, i, tmp);
>>>>                WREG32_SOC15_OFFSET(GC, 0,
>> mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, i*2, 0);
>>>>                WREG32_SOC15_OFFSET(GC, 0,
>>>> mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, i*2, 0); diff --git
>>>> a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
>>>> b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
>>>> index 01918dc..b760018 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
>>>> @@ -222,6 +222,9 @@ static void mmhub_v1_0_setup_vmid_config(struct
>> amdgpu_device *adev)
>>>>                tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
>>>>                                PAGE_TABLE_BLOCK_SIZE,
>>>>                                adev->vm_manager.block_size - 9);
>>>> +             /* Send no-retry XNACK on fault to suppress VM fault storm. */
>>>> +             tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
>>>> +
>>>> + RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
>>>>                WREG32_SOC15_OFFSET(MMHUB, 0, mmVM_CONTEXT1_CNTL, i,
>> tmp);
>>>>                WREG32_SOC15_OFFSET(MMHUB, 0,
>> mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, i*2, 0);
>>>>                WREG32_SOC15_OFFSET(MMHUB, 0,
>>>> mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, i*2, 0);
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx




More information about the amd-gfx mailing list