[Intel-gfx] [PATCH v2] drm/i915/tgl: Magic interrupt shadow to relieve some random lockups

Chris Wilson chris at chris-wilson.co.uk
Mon Sep 30 12:09:52 UTC 2019


Quoting Mika Kuoppala (2019-09-30 13:02:49)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > My current theory is that this masks interrupt delivery to the local CPU
> > during a critical phase. Purely papering over the symptoms with a delay
> > plucked out of thin air from testing on tgl1-gem, refined slightly by
> > just waiting for the next ack (though technically the next CS event may
> > not be the corresponding event for this submit, but an intermediate
> > completion).
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > Cc: Andi Shyti <andi.shyti at intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_lrc.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index ab725a6ca0ac..35410d647b52 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -1155,6 +1155,7 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
> >  static void execlists_submit_ports(struct intel_engine_cs *engine)
> >  {
> >       struct intel_engine_execlists *execlists = &engine->execlists;
> > +     unsigned int tail = READ_ONCE(*execlists->csb_write);
> >       unsigned int n;
> >  
> >       GEM_BUG_ON(!assert_pending_valid(execlists, "submit"));
> > @@ -1186,6 +1187,14 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
> >       /* we need to manually load the submit queue */
> >       if (execlists->ctrl_reg)
> >               writel(EL_CTRL_LOAD, execlists->ctrl_reg);
> > +
> 
> I don't doubt we forget this but a comment would still be
> good that we limit the intr generation for reasons still
> unknown.
> 
> > +     if (IS_TIGERLAKE(engine->i915)) {
> > +             u64 start = local_clock();
> > +             do
> > +                     cpu_relax();
> > +             while (tail == READ_ONCE(*execlists->csb_write) &&
> > +                    (local_clock() - start) >> 20 == 0);
> 
> 1ms..well, if it works :O
> 
> Need to track this closely and pls consider limiting this to current revid
> just so that we would notice the diff immediately when we get updates.
> 
> Acked-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>

Yeah, this one I'm not so keen to put in drm-intel-next-queued itself.
It's clear that we are not pruning features to achieve a stable base,
but now into the hax territory.

If possible, we'll get a topic/tgl-snafu branch where we can soak test
such hax on CI. But failing that, I'll try soaking it in core-for-CI --
once that is rebased on to v5.4-rc1.
-Chris


More information about the Intel-gfx mailing list