[PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

Lazar, Lijo lijo.lazar at amd.com
Sat Jan 22 06:42:28 UTC 2022



On 1/22/2022 2:04 AM, Sharma, Shashank wrote:
>  From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00 2001
> From: Somalapuram Amaranath <Amaranath.Somalapuram at amd.com>
> Date: Fri, 21 Jan 2022 14:19:42 +0530
> Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
> 
> This patch adds a GPU reset handler for Navi ASIC family, which
> typically dumps some of the registersand sends a trace event.
> 
> V2: Accomodated call to work function to send uevent
> 
> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram at amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/nv.c | 28 ++++++++++++++++++++++++++++
>   1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c 
> b/drivers/gpu/drm/amd/amdgpu/nv.c
> index 01efda4398e5..ada35d4c5245 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> @@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>       }
>   }
> 
> +static void amdgpu_reset_dumps(struct amdgpu_device *adev)
> +{
> +    int r = 0, i;
> +
> +    /* original raven doesn't have full asic reset */
> +    if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
> +        !(adev->apu_flags & AMD_APU_IS_RAVEN2))
> +        return;
> +    for (i = 0; i < adev->num_ip_blocks; i++) {
> +        if (!adev->ip_blocks[i].status.valid)
> +            continue;
> +        if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
> +            continue;
> +        r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
> +
> +        if (r)
> +            DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
> +                    adev->ip_blocks[i].version->funcs->name, r);
> +    }
> +
> +    /* Schedule work to send uevent */
> +    if (!queue_work(system_unbound_wq, &adev->gpu_reset_work))
> +        DRM_ERROR("failed to add GPU reset work\n");
> +
> +    dump_stack();
> +}
> +
>   static int nv_asic_reset(struct amdgpu_device *adev)
>   {
>       int ret = 0;
> 
> +    amdgpu_reset_dumps(adev);

Had a comment on this before. Now there are different reasons (or even 
no reason like a precautionary reset) to perform reset. A user would be 
interested in a trace only if the reason is valid.

To clarify on why a work shouldn't be scheduled on every reset, check here -

https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L2188


Thanks,
Lijo

>       switch (nv_asic_reset_method(adev)) {
>       case AMD_RESET_METHOD_PCI:
>           dev_info(adev->dev, "PCI reset\n");


More information about the amd-gfx mailing list