[PATCH 2/9] drm/amdgpu: bail out for userq unmap error

Thu Jun 5 13:24:28 UTC 2025

On Thu, Jun 5, 2025 at 3:54 AM Liang, Prike <Prike.Liang at amd.com> wrote:
>
> [Public]
>
> > From: Alex Deucher <alexdeucher at gmail.com>
> > Sent: Saturday, May 31, 2025 5:34 AM
> > To: Liang, Prike <Prike.Liang at amd.com>
> > Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander
> > <Alexander.Deucher at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>;
> > Lazar, Lijo <Lijo.Lazar at amd.com>
> > Subject: Re: [PATCH 2/9] drm/amdgpu: bail out for userq unmap error
> >
> > On Fri, May 30, 2025 at 3:55 AM Prike Liang <Prike.Liang at amd.com> wrote:
> > >
> > > Before destroy the userq buffer object requires validating the userq
> > > unmap status.
> > >
> > > Signed-off-by: Prike Liang <Prike.Liang at amd.com>
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> > > index f67969312c39..8eea0e1e1b6a 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> > > @@ -319,6 +319,12 @@ amdgpu_userq_destroy(struct drm_file *filp, int
> > queue_id)
> > >         }
> > >         amdgpu_bo_unref(&queue->db_obj.obj);
> > >         r = amdgpu_userq_unmap_helper(uq_mgr, queue);
> > > +       if (r != AMDGPU_USERQ_STATE_UNMAPPED) {
> > > +               drm_dbg_driver(adev_to_drm(uq_mgr->adev), "Can't unmap the queue
> > for destroying.\n");
> > > +               mutex_unlock(&uq_mgr->userq_mutex);
> > > +               /*TODO: before return may need to a reset*/
> > > +               return r;
> >
> > If we return early here, we'll leak memory.  Presumably if the unmap failed, the
> > queue is hung, so it shouldn't cause any problems.
>
> [Prike] Yeah, maybe it only requires an aware here and then continue destroying the userq software resources.
> Do we need to reset the queue when unmap fails during userq destroy?

Probably.  Otherwise I suspect the next time the MES tries to use that
queue it will already be hung.

Alex

>
> > Alex
> >
> > > +       }
> > >         amdgpu_userq_cleanup(uq_mgr, queue, queue_id);
> > >         mutex_unlock(&uq_mgr->userq_mutex);
> > >
> > > --
> > > 2.34.1
> > >