[PATCH] drm/amdgpu: clean up psp ip if hw_init failed v2

Alex Deucher alexdeucher at gmail.com
Fri Apr 22 21:00:34 UTC 2022


On Fri, Apr 22, 2022 at 3:54 PM Wong, Alice <Shiwei.Wong at amd.com> wrote:
>
> [AMD Official Use Only]
>
> Hi Alex,
>
> The attached patch freed most of the allocated memory except for one allocated by psp_tmr_init during psp_load_fw.
> Combination of the attached patch and calling psp_hw_fini when psp_hw_init failed would fix the issue.
>
> As a proper fix, we can call psp_tmr_terminate in psp_load_fw when psp_load_non_psp_fw failed. (attached patch)
> We can't move psp_tmr_init to sw_init because we need to load toc to get the TMR size.
> Do you have any concerns with this approach?

That only covers failures in hw_init().  It doesn't handle resume().
Looks like all of the ta functions also potentially leak.  I'm working
on a cleanup to handle all of these.

Alex

>
> Alice
>
>
> -----Original Message-----
> From: Alex Deucher <alexdeucher at gmail.com>
> Sent: April 21, 2022 1:31 AM
> To: Wong, Alice <Shiwei.Wong at amd.com>
> Cc: amd-gfx list <amd-gfx at lists.freedesktop.org>
> Subject: Re: [PATCH] drm/amdgpu: clean up psp ip if hw_init failed v2
>
> On Wed, Apr 20, 2022 at 5:48 PM Alice Wong <shiwei.wong at amd.com> wrote:
> >
> > If at any point psp_hw_init failed, psp_hw_fini would not be called
> > during unload due to ip_blocks[PSP].status.hw not being set to true.
> > This could cause a memory leak when the driver unloads.
> > As a rule of thumb, each IP block should cleanup themselves when their
> > hw_init fails. Only previously intialized blocks should be cleaned up
> > by the common framework.
> >
> > v1: Call IP blocks' respective hw_fini when hw_init failed from
> >     the common framework
> > v2: Call psp_hw_fini when psp_hw_init failed.
> >
> > Signed-off-by: Alice Wong <shiwei.wong at amd.com>
>
> I don't think this is a good idea.  The hw programming in hw_fini makes no sense if the driver never set it up in the first place if hw_init failed before initializing the hw.  It would be better to just properly unwind the relevant functions.  Presumably the problem you are seeing is the failure to free the GPU memory allocated in fw_fini, depending on where it fails.  We should just allocate the memory in sw_init; that is what we do in other IPs.  Does the attached patch fix the issue you are seeing?
>
> Alex
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 57
> > +++++++++++++------------
> >  1 file changed, 29 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > index 5b9e48d44f5b..52b14efa848a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > @@ -2537,6 +2537,34 @@ static int psp_load_fw(struct amdgpu_device *adev)
> >         return ret;
> >  }
> >
> > +static int psp_hw_fini(void *handle)
> > +{
> > +       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > +       struct psp_context *psp = &adev->psp;
> > +
> > +       if (psp->ta_fw) {
> > +               psp_ras_terminate(psp);
> > +               psp_securedisplay_terminate(psp);
> > +               psp_rap_terminate(psp);
> > +               psp_dtm_terminate(psp);
> > +               psp_hdcp_terminate(psp);
> > +       }
> > +
> > +       psp_asd_terminate(psp);
> > +
> > +       psp_tmr_terminate(psp);
> > +       psp_ring_destroy(psp, PSP_RING_TYPE__KM);
> > +
> > +       amdgpu_bo_free_kernel(&psp->fw_pri_bo,
> > +                             &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
> > +       amdgpu_bo_free_kernel(&psp->fence_buf_bo,
> > +                             &psp->fence_buf_mc_addr, &psp->fence_buf);
> > +       amdgpu_bo_free_kernel(&psp->cmd_buf_bo, &psp->cmd_buf_mc_addr,
> > +                             (void **)&psp->cmd_buf_mem);
> > +
> > +       return 0;
> > +}
> > +
> >  static int psp_hw_init(void *handle)
> >  {
> >         int ret;
> > @@ -2563,37 +2591,10 @@ static int psp_hw_init(void *handle)
> >  failed:
> >         adev->firmware.load_type = AMDGPU_FW_LOAD_DIRECT;
> >         mutex_unlock(&adev->firmware.mutex);
> > +       psp_hw_fini(handle);
> >         return -EINVAL;
> >  }
> >
> > -static int psp_hw_fini(void *handle)
> > -{
> > -       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > -       struct psp_context *psp = &adev->psp;
> > -
> > -       if (psp->ta_fw) {
> > -               psp_ras_terminate(psp);
> > -               psp_securedisplay_terminate(psp);
> > -               psp_rap_terminate(psp);
> > -               psp_dtm_terminate(psp);
> > -               psp_hdcp_terminate(psp);
> > -       }
> > -
> > -       psp_asd_terminate(psp);
> > -
> > -       psp_tmr_terminate(psp);
> > -       psp_ring_destroy(psp, PSP_RING_TYPE__KM);
> > -
> > -       amdgpu_bo_free_kernel(&psp->fw_pri_bo,
> > -                             &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
> > -       amdgpu_bo_free_kernel(&psp->fence_buf_bo,
> > -                             &psp->fence_buf_mc_addr, &psp->fence_buf);
> > -       amdgpu_bo_free_kernel(&psp->cmd_buf_bo, &psp->cmd_buf_mc_addr,
> > -                             (void **)&psp->cmd_buf_mem);
> > -
> > -       return 0;
> > -}
> > -
> >  static int psp_suspend(void *handle)
> >  {
> >         int ret;
> > --
> > 2.25.1
> >


More information about the amd-gfx mailing list