[PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

Sharma, Shashank shashank.sharma at amd.com
Mon Jan 24 16:50:45 UTC 2022



On 1/24/2022 8:18 AM, Christian König wrote:
> 
> 
> Am 21.01.22 um 21:34 schrieb Sharma, Shashank:
>> From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00 2001
>> From: Somalapuram Amaranath <Amaranath.Somalapuram at amd.com>
>> Date: Fri, 21 Jan 2022 14:19:42 +0530
>> Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
>>
>> This patch adds a GPU reset handler for Navi ASIC family, which
>> typically dumps some of the registersand sends a trace event.
>>
>> V2: Accomodated call to work function to send uevent
>>
>> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram at amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/nv.c | 28 ++++++++++++++++++++++++++++
>>  1 file changed, 28 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c 
>> b/drivers/gpu/drm/amd/amdgpu/nv.c
>> index 01efda4398e5..ada35d4c5245 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>> @@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>>      }
>>  }
>>
>> +static void amdgpu_reset_dumps(struct amdgpu_device *adev)
>> +{
>> +    int r = 0, i;
> 
> Please don't initialize variables if it isn't absolutely necessary.
> 

Agree, @Amar please check this out.

>> +
>> +    /* original raven doesn't have full asic reset */
>> +    if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
>> +        !(adev->apu_flags & AMD_APU_IS_RAVEN2))
>> +        return;
>> +    for (i = 0; i < adev->num_ip_blocks; i++) {
>> +        if (!adev->ip_blocks[i].status.valid)
>> +            continue;
>> +        if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
>> +            continue;
>> +        r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
>> +
>> +        if (r)
>> +            DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
>> + adev->ip_blocks[i].version->funcs->name, r);
>> +    }
>> +
>> +    /* Schedule work to send uevent */
>> +    if (!queue_work(system_unbound_wq, &adev->gpu_reset_work))
>> +        DRM_ERROR("failed to add GPU reset work\n");
>> +
>> +    dump_stack();
>> +}
>> +
> 
> I'm pretty sure that should be inside common code and not here.
> 
> In other words that is absolutely not ASIC specific at all.

Yeah, I agree. I think the initial problem for Amar was that some 
offsets were only valid for particular ASICs, but nothing is stopping us 
to make this in a generic function followed by a asic special function.

We might have to move the asic.fptr to a more generic higher level fptr.

- Shashank

> 
> Regards,
> Christian.
> 
>>  static int nv_asic_reset(struct amdgpu_device *adev)
>>  {
>>      int ret = 0;
>>
>> +    amdgpu_reset_dumps(adev);
>>      switch (nv_asic_reset_method(adev)) {
>>      case AMD_RESET_METHOD_PCI:
>>          dev_info(adev->dev, "PCI reset\n");
> 


More information about the amd-gfx mailing list