[PATCH] drm/amdgpu: fix S3 issue if MQD in VRAM

Alex Deucher alexdeucher at gmail.com
Mon May 15 15:10:10 UTC 2023


On Mon, May 15, 2023 at 11:02 AM Xiao, Jack <Jack.Xiao at amd.com> wrote:
>
> [AMD Official Use Only - General]
>
>
> Yes, it should work. If that, the routine would behave like *reset*. That means driver would behave like to create a new queue instead of restoring a saved queue. I can have a try with this solution.
>

I think it should work as is.  At least suspend/resume worked when I
tested it.  The logic in gfx_v11_0_gfx_init_queue() is:

if (!amdgpu_in_reset(adev) && !adev->in_suspend) {
    ...
    // Save the original image after init at driver load time
    memcpy(adev->gfx.me.mqd_backup[mqd_idx], mqd, sizeof(*mqd));
} else {
    // restore the saved image at resume or reset time
    memcpy(mqd, adev->gfx.me.mqd_backup[mqd_idx], sizeof(*mqd));
    ...
}

Alex

> Regards,
> Jack
> ________________________________
> From: Alex Deucher <alexdeucher at gmail.com>
> Sent: Monday, 15 May 2023 21:08
> To: Xiao, Jack <Jack.Xiao at amd.com>
> Cc: amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: fix S3 issue if MQD in VRAM
>
> On Mon, May 15, 2023 at 6:40 AM Xiao, Jack <Jack.Xiao at amd.com> wrote:
> >
> > [AMD Official Use Only - General]
> >
> > The MQD data in VRAM would be lost after S3,  for the MQD bo is pinned down as kernel bo and can't be evicted to system memory.
> > AFAIK, driver should not to do allocate/free memory during S3, as there are issues observed to do memory allocation during S3.
>
> We restore the contents of the MQD in gfx_v*_0_gfx_init_queue() and
> gfx_v*_0_kcq_init_queue().
>
> Alex
>
> >
> > Regards,
> > Jack
> >
> > -----Original Message-----
> > From: Alex Deucher <alexdeucher at gmail.com>
> > Sent: Friday, May 12, 2023 9:13 PM
> > To: Xiao, Jack <Jack.Xiao at amd.com>
> > Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher at amd.com>
> > Subject: Re: [PATCH] drm/amdgpu: fix S3 issue if MQD in VRAM
> >
> > On Fri, May 12, 2023 at 4:16 AM Jack Xiao <Jack.Xiao at amd.com> wrote:
> > >
> > > Make the preemption optimization effect only for SRIOV, for it caused
> > > failure to resume from S3.
> >
> > Can you elaborate?  We ultimately want MQDs in VRAM for performance reasons even for bare metal.
> >
> > Alex
> >
> > >
> > > Signed-off-by: Jack Xiao <Jack.Xiao at amd.com>
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 ++-
> > > drivers/gpu/drm/amd/amdgpu/mes_v10_1.c  | 7 +++++--
> > > drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  | 7 +++++--
> > >  3 files changed, 12 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > > index a22d88a4178a..1b795b7bbf38 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > > @@ -385,7 +385,8 @@ int amdgpu_gfx_mqd_sw_init(struct amdgpu_device *adev,
> > >         u32 domain = AMDGPU_GEM_DOMAIN_GTT;
> > >
> > >         /* Only enable on gfx10 and 11 for now to avoid changing behavior on older chips */
> > > -       if (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(10, 0, 0))
> > > +       if (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(10, 0, 0) &&
> > > +           amdgpu_sriov_vf(adev))
> > >                 domain |= AMDGPU_GEM_DOMAIN_VRAM;
> > >
> > >         /* create MQD for KIQ */
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
> > > b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
> > > index 4560476c7c31..5c3d3f6c7ebd 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
> > > @@ -889,6 +889,7 @@ static int mes_v10_1_mqd_sw_init(struct
> > > amdgpu_device *adev,  {
> > >         int r, mqd_size = sizeof(struct v10_compute_mqd);
> > >         struct amdgpu_ring *ring;
> > > +       u32 domain = AMDGPU_GEM_DOMAIN_GTT;
> > >
> > >         if (pipe == AMDGPU_MES_KIQ_PIPE)
> > >                 ring = &adev->gfx.kiq[0].ring; @@ -900,9 +901,11 @@
> > > static int mes_v10_1_mqd_sw_init(struct amdgpu_device *adev,
> > >         if (ring->mqd_obj)
> > >                 return 0;
> > >
> > > +       if (amdgpu_sriov_vf(adev))
> > > +               domain |= AMDGPU_GEM_DOMAIN_VRAM;
> > > +
> > >         r = amdgpu_bo_create_kernel(adev, mqd_size, PAGE_SIZE,
> > > -                                   AMDGPU_GEM_DOMAIN_VRAM |
> > > -                                   AMDGPU_GEM_DOMAIN_GTT, &ring->mqd_obj,
> > > +                                   domain, &ring->mqd_obj,
> > >                                     &ring->mqd_gpu_addr, &ring->mqd_ptr);
> > >         if (r) {
> > >                 dev_warn(adev->dev, "failed to create ring mqd bo
> > > (%d)", r); diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > > b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > > index 3adb450eec07..79a4d2bfd94a 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > > @@ -987,6 +987,7 @@ static int mes_v11_0_mqd_sw_init(struct
> > > amdgpu_device *adev,  {
> > >         int r, mqd_size = sizeof(struct v11_compute_mqd);
> > >         struct amdgpu_ring *ring;
> > > +       u32 domain = AMDGPU_GEM_DOMAIN_GTT;
> > >
> > >         if (pipe == AMDGPU_MES_KIQ_PIPE)
> > >                 ring = &adev->gfx.kiq[0].ring; @@ -998,9 +999,11 @@
> > > static int mes_v11_0_mqd_sw_init(struct amdgpu_device *adev,
> > >         if (ring->mqd_obj)
> > >                 return 0;
> > >
> > > +       if (amdgpu_sriov_vf(adev))
> > > +               domain |= AMDGPU_GEM_DOMAIN_VRAM;
> > > +
> > >         r = amdgpu_bo_create_kernel(adev, mqd_size, PAGE_SIZE,
> > > -                                   AMDGPU_GEM_DOMAIN_VRAM |
> > > -                                   AMDGPU_GEM_DOMAIN_GTT, &ring->mqd_obj,
> > > +                                   domain, &ring->mqd_obj,
> > >                                     &ring->mqd_gpu_addr, &ring->mqd_ptr);
> > >         if (r) {
> > >                 dev_warn(adev->dev, "failed to create ring mqd bo
> > > (%d)", r);
> > > --
> > > 2.37.3
> > >


More information about the amd-gfx mailing list