[Intel-xe] [PATCH] drm/xe: Grab runtime PM over MMIO ioctl

Matthew Auld matthew.william.auld at gmail.com
Fri Jun 16 12:10:49 UTC 2023


On Thu, 15 Jun 2023 at 21:20, Matt Roper <matthew.d.roper at intel.com> wrote:
>
> On Thu, Jun 15, 2023 at 09:42:03AM +0100, Matthew Auld wrote:
> > On Wed, 14 Jun 2023 at 21:04, Matt Roper <matthew.d.roper at intel.com> wrote:
> > >
> > > The MMIO ioctl (which is generally used from IGT's xe_reg tool) may be
> > > called while the device is in D3, resulting in invalid results (i.e.,
> > > all registers read back as 0xFFFFFFFF, writes are dropped).  Make sure
> > > we grab runtime PM wakeref before performing register reads/writes.
> > >
> > > Signed-off-by: Matt Roper <matthew.d.roper at intel.com>
> >
> > AFAIK we should use mem_access_get() for MMIO:
>
> I thought mem_access_get() was specifically for memory access (like we
> needed for a special workaround on some early steppings of PVC); is it
> intended to be used for all devices accesses, not just memory now?  If
> so we might want to rename it to something more accurate like
> 'mmio_access_get' (and also add some kerneldoc explaining when/why it
> would be used instead of the standard runtime PM).

AFAIK Rodrigo is planning to add some docs to explain its usage. I was
also under the impression that it was more for stuff like VRAM memory
access, but it seems MMIO should also be included or really any device
level memory access.

https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/281

>
> Actually, I'm having trouble understanding what mem_access_get is trying
> to do.  It seems like it's trying to avoid using the runtime PM's
> refcounting and is turning it into a boolean on/off instead, which makes
> things a bit racy.  It's not obvious to me why we're trying to work
> around the refcount design?

I don't know the full history here, so I'm not completely sure tbh.

AFAICT the racy bit is in xe_pm_runtime_resume_if_suspended(), when it
calls pm_runtime_suspended(), which is a TOCTOU bug and also doesn't
work if we are currently in the process of suspending/resuming, but
the reason for having the check is to prevent deadlocking when
pm_runtime_get_sync() is called from our runtime pm callbacks. So
dropping that check fixes the races but then uncovers all those
deadlocks. Parking that to one side, I don't think there is any issue
with always calling pm_runtime_get_sync(), we just need to have some
way of asserting at various places in the driver that we have ref > 0
or perhaps that the device is PM active. But if we end up having our
own mem_access.ref then there is no strict need to call into PM every
time.

>
>
> Matt
>
> > https://patchwork.freedesktop.org/patch/542124/?series=119213&rev=1
> >
> > But that requires first fixing the races with mem_access which is
> > quite involved, so perhaps using runtime_get() directly here is fine
> > for now. We can just replace the instance here with mem_access once
> > that's ready.
> >
> > Reviewed-by: Matthew Auld <matthew.auld at intel.com>
> >
> > > ---
> > >  drivers/gpu/drm/xe/xe_mmio.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
> > > index f7a7f996b37f..5f4c8ae23771 100644
> > > --- a/drivers/gpu/drm/xe/xe_mmio.c
> > > +++ b/drivers/gpu/drm/xe/xe_mmio.c
> > > @@ -19,6 +19,7 @@
> > >  #include "xe_gt_mcr.h"
> > >  #include "xe_macros.h"
> > >  #include "xe_module.h"
> > > +#include "xe_pm.h"
> > >
> > >  #define XEHP_MTCFG_ADDR                XE_REG(0x101800)
> > >  #define TILE_COUNT             REG_GENMASK(15, 8)
> > > @@ -476,6 +477,7 @@ int xe_mmio_ioctl(struct drm_device *dev, void *data,
> > >          */
> > >         reg = XE_REG(args->addr);
> > >
> > > +       xe_pm_runtime_get(xe);
> > >         xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
> > >
> > >         if (args->flags & DRM_XE_MMIO_WRITE) {
> > > @@ -519,6 +521,7 @@ int xe_mmio_ioctl(struct drm_device *dev, void *data,
> > >
> > >  exit:
> > >         xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL);
> > > +       xe_pm_runtime_put(xe);
> > >
> > >         return ret;
> > >  }
> > > --
> > > 2.40.1
> > >
>
> --
> Matt Roper
> Graphics Software Engineer
> Linux GPU Platform Enablement
> Intel Corporation


More information about the Intel-xe mailing list