[PATCH] drm/amdgpu: add the command AMDGPU_INFO_QUEUE_RESET to query queue reset

Zhang, Jesse(Jie) Jesse.Zhang at amd.com
Fri Oct 18 09:33:53 UTC 2024


[AMD Official Use Only - AMD Internal Distribution Only]

Hi Christian,

-----Original Message-----
From: Koenig, Christian <Christian.Koenig at amd.com>
Sent: Friday, October 18, 2024 4:47 PM
To: Zhang, Jesse(Jie) <Jesse.Zhang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
Subject: Re: [PATCH] drm/amdgpu: add the command AMDGPU_INFO_QUEUE_RESET to query queue reset

Am 18.10.24 um 10:19 schrieb Jesse.zhang at amd.com:
> Not all ASICs support the queue reset feature.
> Therefore, userspace can query this feature via
> AMDGPU_INFO_QUEUE_RESET before validating a queue reset.

Why would UMDs need that information?

To verify queue reset.
Now the igt uses many asic filters to hard code if queue reset is ready.
Alex suggested that we can get the information directly from the driver.

Another problem is that our driver has added code about queue reset.
The reset function is complete (adev->gfx.gfx_ring[0].funcs->reset),
but fw partially supports it.
For example navi31,   KCQ invalid opcode case can be recovered by queue reset,
but KCQ invalid packet length cannot be recovered now.

So for this case, I am not sure whether we can return true for this function.

More information can be obtained from the link:
https://confluence.amd.com/display/AMDGPU/Phase+1+-+Validation+of+Per+Queue+Reset+for+Kernel+Queue

Thanks
Jesse
>
> Signed-off-by: Jesse Zhang <jesse.zhang at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 27 +++++++++++++++++++++++++
>   include/uapi/drm/amdgpu_drm.h           |  2 ++
>   2 files changed, 29 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index b53c35992152..87dee858fb4c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -577,6 +577,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
>       uint64_t ui64 = 0;
>       int i, found, ret;
>       int ui32_size = sizeof(ui32);
> +     bool queue_reset;
>
>       if (!info->return_size || !info->return_pointer)
>               return -EINVAL;
> @@ -1282,6 +1283,32 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
>               return copy_to_user(out, &gpuvm_fault,
>                                   min((size_t)size, sizeof(gpuvm_fault))) ? -EFAULT : 0;
>       }
> +     case AMDGPU_INFO_QUEUE_RESET: {
> +             fpriv = (struct amdgpu_fpriv *)filp->driver_priv;
> +             type = amdgpu_ip_get_block_type(adev, info->query_hw_ip.type);
> +             ip_block = amdgpu_device_ip_get_ip_block(adev, type);
> +
> +             if (!ip_block || !ip_block->status.valid)
> +                     return -EINVAL;
> +
> +             switch (info->query_hw_ip.type) {
> +             case AMDGPU_HW_IP_GFX:
> +                     queue_reset = adev->gfx.gfx_ring[0].funcs->reset ? true : false;

Please never ever use this coding style.

If you want to convert anything into a boolean use the !! operator.

Regards,
Christian.

> +                     break;
> +             case AMDGPU_HW_IP_COMPUTE:
> +                     queue_reset = adev->gfx.compute_ring[0].funcs->reset ? true : false;
> +                     break;
> +             case AMDGPU_HW_IP_DMA:
> +                     queue_reset = adev->sdma.instance[0].ring.funcs->reset ? true : false;
> +                     break;
> +             case AMDGPU_HW_IP_UVD_ENC:
> +             case AMDGPU_HW_IP_VCN_DEC:
> +             default:
> +                     queue_reset = false;
> +             }
> +
> +             return copy_to_user(out, &queue_reset, min(size, 4u)) ? -EFAULT : 0;
> +     }
>       default:
>               DRM_DEBUG_KMS("Invalid request %d\n", info->query);
>               return -EINVAL;
> diff --git a/include/uapi/drm/amdgpu_drm.h
> b/include/uapi/drm/amdgpu_drm.h index d9bff1c3b326..3b17d82fd1ee
> 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -1052,6 +1052,8 @@ struct drm_amdgpu_cs_chunk_cp_gfx_shadow {
>   #define AMDGPU_INFO_MAX_IBS                 0x22
>   /* query last page fault info */
>   #define AMDGPU_INFO_GPUVM_FAULT                     0x23
> +/* query queue reset */
> +#define AMDGPU_INFO_QUEUE_RESET                      0x24
>
>   #define AMDGPU_INFO_MMR_SE_INDEX_SHIFT      0
>   #define AMDGPU_INFO_MMR_SE_INDEX_MASK       0xff



More information about the amd-gfx mailing list