[PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

Andrey Grodzovsky andrey.grodzovsky at amd.com
Mon Jan 24 17:08:00 UTC 2022


It's just an infrastructure you use when you need.
I never tested it during reset i think but, we deliberately did it very 
self reliant where you simply iterate a FIFO of the dump through PMI3 
registers interface and dump out the content. It currently supposed to 
work for the NV family.

In case you encounter issues during reset let me know and I will do my 
best to resolve them.

Andrey

On 2022-01-24 11:38, Sharma, Shashank wrote:
> Hey Andrey,
> That seems like a good idea, may I know if there is a trigger for STB 
> dump ? or is it just the infrastructure which one can use when they 
> feel a need to dump info ? Also, how reliable is the STB infra during 
> a reset ?
>
> Regards
> Shashank
> On 1/24/2022 5:32 PM, Andrey Grodzovsky wrote:
>> You probably can add the STB dump we worked on a while ago to your 
>> info dump - a reminder
>> on the feature is here 
>> https://www.spinics.net/lists/amd-gfx/msg70751.html
>>
>> Andrey
>>
>> On 2022-01-21 15:34, Sharma, Shashank wrote:
>>> From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00 2001
>>> From: Somalapuram Amaranath <Amaranath.Somalapuram at amd.com>
>>> Date: Fri, 21 Jan 2022 14:19:42 +0530
>>> Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
>>>
>>> This patch adds a GPU reset handler for Navi ASIC family, which
>>> typically dumps some of the registersand sends a trace event.
>>>
>>> V2: Accomodated call to work function to send uevent
>>>
>>> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram at amd.com>
>>> Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/nv.c | 28 ++++++++++++++++++++++++++++
>>>  1 file changed, 28 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c 
>>> b/drivers/gpu/drm/amd/amdgpu/nv.c
>>> index 01efda4398e5..ada35d4c5245 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>>> @@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>>>      }
>>>  }
>>>
>>> +static void amdgpu_reset_dumps(struct amdgpu_device *adev)
>>> +{
>>> +    int r = 0, i;
>>> +
>>> +    /* original raven doesn't have full asic reset */
>>> +    if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
>>> +        !(adev->apu_flags & AMD_APU_IS_RAVEN2))
>>> +        return;
>>> +    for (i = 0; i < adev->num_ip_blocks; i++) {
>>> +        if (!adev->ip_blocks[i].status.valid)
>>> +            continue;
>>> +        if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
>>> +            continue;
>>> +        r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
>>> +
>>> +        if (r)
>>> +            DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
>>> + adev->ip_blocks[i].version->funcs->name, r);
>>> +    }
>>> +
>>> +    /* Schedule work to send uevent */
>>> +    if (!queue_work(system_unbound_wq, &adev->gpu_reset_work))
>>> +        DRM_ERROR("failed to add GPU reset work\n");
>>> +
>>> +    dump_stack();
>>> +}
>>> +
>>>  static int nv_asic_reset(struct amdgpu_device *adev)
>>>  {
>>>      int ret = 0;
>>>
>>> +    amdgpu_reset_dumps(adev);
>>>      switch (nv_asic_reset_method(adev)) {
>>>      case AMD_RESET_METHOD_PCI:
>>>          dev_info(adev->dev, "PCI reset\n");


More information about the amd-gfx mailing list