[PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

Mon Jan 24 17:11:31 UTC 2022

On 1/24/2022 6:08 PM, Andrey Grodzovsky wrote:
> It's just an infrastructure you use when you need.
> I never tested it during reset i think but, we deliberately did it very 
> self reliant where you simply iterate a FIFO of the dump through PMI3 
> registers interface and dump out the content. It currently supposed to 
> work for the NV family.
>

Got it, thanks for the suggestion. Let me check the feasibility of 
plug-in STB in out existing design and use case.

- Shashank

> In case you encounter issues during reset let me know and I will do my 
> best to resolve them.
> 
> Andrey
> 
> On 2022-01-24 11:38, Sharma, Shashank wrote:
>> Hey Andrey,
>> That seems like a good idea, may I know if there is a trigger for STB 
>> dump ? or is it just the infrastructure which one can use when they 
>> feel a need to dump info ? Also, how reliable is the STB infra during 
>> a reset ?
>>
>> Regards
>> Shashank
>> On 1/24/2022 5:32 PM, Andrey Grodzovsky wrote:
>>> You probably can add the STB dump we worked on a while ago to your 
>>> info dump - a reminder
>>> on the feature is here 
>>> https://www.spinics.net/lists/amd-gfx/msg70751.html
>>>
>>> Andrey
>>>
>>> On 2022-01-21 15:34, Sharma, Shashank wrote:
>>>> From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00 2001
>>>> From: Somalapuram Amaranath <Amaranath.Somalapuram at amd.com>
>>>> Date: Fri, 21 Jan 2022 14:19:42 +0530
>>>> Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
>>>>
>>>> This patch adds a GPU reset handler for Navi ASIC family, which
>>>> typically dumps some of the registersand sends a trace event.
>>>>
>>>> V2: Accomodated call to work function to send uevent
>>>>
>>>> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram at amd.com>
>>>> Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
>>>> ---
>>>>  drivers/gpu/drm/amd/amdgpu/nv.c | 28 ++++++++++++++++++++++++++++
>>>>  1 file changed, 28 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/nv.c
>>>> index 01efda4398e5..ada35d4c5245 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>>>> @@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>>>>      }
>>>>  }
>>>>
>>>> +static void amdgpu_reset_dumps(struct amdgpu_device *adev)
>>>> +{
>>>> +    int r = 0, i;
>>>> +
>>>> +    /* original raven doesn't have full asic reset */
>>>> +    if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
>>>> +        !(adev->apu_flags & AMD_APU_IS_RAVEN2))
>>>> +        return;
>>>> +    for (i = 0; i < adev->num_ip_blocks; i++) {
>>>> +        if (!adev->ip_blocks[i].status.valid)
>>>> +            continue;
>>>> +        if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
>>>> +            continue;
>>>> +        r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
>>>> +
>>>> +        if (r)
>>>> +            DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
>>>> + adev->ip_blocks[i].version->funcs->name, r);
>>>> +    }
>>>> +
>>>> +    /* Schedule work to send uevent */
>>>> +    if (!queue_work(system_unbound_wq, &adev->gpu_reset_work))
>>>> +        DRM_ERROR("failed to add GPU reset work\n");
>>>> +
>>>> +    dump_stack();
>>>> +}
>>>> +
>>>>  static int nv_asic_reset(struct amdgpu_device *adev)
>>>>  {
>>>>      int ret = 0;
>>>>
>>>> +    amdgpu_reset_dumps(adev);
>>>>      switch (nv_asic_reset_method(adev)) {
>>>>      case AMD_RESET_METHOD_PCI:
>>>>          dev_info(adev->dev, "PCI reset\n");