[PATCH] drm/amdgpu: add the command AMDGPU_INFO_QUEUE_RESET to query queue reset
Alex Deucher
alexdeucher at gmail.com
Fri Oct 18 13:23:00 UTC 2024
On Fri, Oct 18, 2024 at 7:19 AM Christian König
<christian.koenig at amd.com> wrote:
>
> Am 18.10.24 um 11:33 schrieb Zhang, Jesse(Jie):
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> > Hi Christian,
> >
> > -----Original Message-----
> > From: Koenig, Christian <Christian.Koenig at amd.com>
> > Sent: Friday, October 18, 2024 4:47 PM
> > To: Zhang, Jesse(Jie) <Jesse.Zhang at amd.com>; amd-gfx at lists.freedesktop.org
> > Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
> > Subject: Re: [PATCH] drm/amdgpu: add the command AMDGPU_INFO_QUEUE_RESET to query queue reset
> >
> > Am 18.10.24 um 10:19 schrieb Jesse.zhang at amd.com:
> >> Not all ASICs support the queue reset feature.
> >> Therefore, userspace can query this feature via
> >> AMDGPU_INFO_QUEUE_RESET before validating a queue reset.
> > Why would UMDs need that information?
> >
> > To verify queue reset.
> > Now the igt uses many asic filters to hard code if queue reset is ready.
> > Alex suggested that we can get the information directly from the driver.
>
> Ah, ok got it. Mhm in general sounds like the approach is cleaner, but
> the IOCTL interface is supposed to be used by the UMD and tested by the
> IGT tests.
>
> The problem is now that it's documented that the justification for
> having the IOCTLs is the UMD and not the IGT tests.
>
> Could we also do this as debugfs interface?
>
> > Another problem is that our driver has added code about queue reset.
> > The reset function is complete (adev->gfx.gfx_ring[0].funcs->reset),
> > but fw partially supports it.
> > For example navi31, KCQ invalid opcode case can be recovered by queue reset,
> > but KCQ invalid packet length cannot be recovered now.
> >
> > So for this case, I am not sure whether we can return true for this function.
> >
> > More information can be obtained from the link:
> > https://confluence.amd.com/display/AMDGPU/Phase+1+-+Validation+of+Per+Queue+Reset+for+Kernel+Queue
>
> Oh, yeah good question. Alex should probably answer that.
Maybe a better option would be a reset_mask sysfs file for each IP
which we could have a bit for each reset type. E.g.,
gc_reset_mask
sdma_reset_mask
vcn_reset_mask
jpeg_reset_mask
vpe_reset_mask
AMDGPU_RESET_TYPE_FULL (1 << 0) /* full adapter reset, mode1/mode2/BACO/etc. */
AMDGPU_RESET_TYPE_SOFT_RESET (1 << 1) /* IP level soft reset */
AMDGPU_RESET_TYPE_PER_QUEUE (1 << 2) /* per queue */
AMDGPU_RESET_TYPE_PER_PIPE (1 << 3) /* per pipe */
Then the IGT tests could query the mask and determine which tests to run.
Alex
>
> Regards,
> Christian.
>
> >
> > Thanks
> > Jesse
> >> Signed-off-by: Jesse Zhang <jesse.zhang at amd.com>
> >> ---
> >> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 27 +++++++++++++++++++++++++
> >> include/uapi/drm/amdgpu_drm.h | 2 ++
> >> 2 files changed, 29 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> >> index b53c35992152..87dee858fb4c 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> >> @@ -577,6 +577,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
> >> uint64_t ui64 = 0;
> >> int i, found, ret;
> >> int ui32_size = sizeof(ui32);
> >> + bool queue_reset;
> >>
> >> if (!info->return_size || !info->return_pointer)
> >> return -EINVAL;
> >> @@ -1282,6 +1283,32 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
> >> return copy_to_user(out, &gpuvm_fault,
> >> min((size_t)size, sizeof(gpuvm_fault))) ? -EFAULT : 0;
> >> }
> >> + case AMDGPU_INFO_QUEUE_RESET: {
> >> + fpriv = (struct amdgpu_fpriv *)filp->driver_priv;
> >> + type = amdgpu_ip_get_block_type(adev, info->query_hw_ip.type);
> >> + ip_block = amdgpu_device_ip_get_ip_block(adev, type);
> >> +
> >> + if (!ip_block || !ip_block->status.valid)
> >> + return -EINVAL;
> >> +
> >> + switch (info->query_hw_ip.type) {
> >> + case AMDGPU_HW_IP_GFX:
> >> + queue_reset = adev->gfx.gfx_ring[0].funcs->reset ? true : false;
> > Please never ever use this coding style.
> >
> > If you want to convert anything into a boolean use the !! operator.
> >
> > Regards,
> > Christian.
> >
> >> + break;
> >> + case AMDGPU_HW_IP_COMPUTE:
> >> + queue_reset = adev->gfx.compute_ring[0].funcs->reset ? true : false;
> >> + break;
> >> + case AMDGPU_HW_IP_DMA:
> >> + queue_reset = adev->sdma.instance[0].ring.funcs->reset ? true : false;
> >> + break;
> >> + case AMDGPU_HW_IP_UVD_ENC:
> >> + case AMDGPU_HW_IP_VCN_DEC:
> >> + default:
> >> + queue_reset = false;
> >> + }
> >> +
> >> + return copy_to_user(out, &queue_reset, min(size, 4u)) ? -EFAULT : 0;
> >> + }
> >> default:
> >> DRM_DEBUG_KMS("Invalid request %d\n", info->query);
> >> return -EINVAL;
> >> diff --git a/include/uapi/drm/amdgpu_drm.h
> >> b/include/uapi/drm/amdgpu_drm.h index d9bff1c3b326..3b17d82fd1ee
> >> 100644
> >> --- a/include/uapi/drm/amdgpu_drm.h
> >> +++ b/include/uapi/drm/amdgpu_drm.h
> >> @@ -1052,6 +1052,8 @@ struct drm_amdgpu_cs_chunk_cp_gfx_shadow {
> >> #define AMDGPU_INFO_MAX_IBS 0x22
> >> /* query last page fault info */
> >> #define AMDGPU_INFO_GPUVM_FAULT 0x23
> >> +/* query queue reset */
> >> +#define AMDGPU_INFO_QUEUE_RESET 0x24
> >>
> >> #define AMDGPU_INFO_MMR_SE_INDEX_SHIFT 0
> >> #define AMDGPU_INFO_MMR_SE_INDEX_MASK 0xff
>
More information about the amd-gfx
mailing list