[Intel-gfx] [PATCH 3/3] drm/i915/gt: Apply the CSB w/a for all

Fri Aug 14 19:41:32 UTC 2020

Quoting Mika Kuoppala (2020-08-14 19:41:14)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > Since we expect to inline the csb_parse() routines, the w/a for the
> > stale CSB data on Tigerlake will be pulled into process_csb(), and so we
> > might as well simply reuse the logic for all, and so will hopefully
> > avoid any strange behaviour on Icelake that was not covered by our
> > previous w/a.
> >
> > References: d8f505311717 ("drm/i915/icl: Forcibly evict stale csb entries")
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > Cc: Bruce Chang <yu.bruce.chang at intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_lrc.c | 70 +++++++++++++++++------------
> >  1 file changed, 42 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 3b8161c6b601..c176a029f27b 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -2496,25 +2496,11 @@ invalidate_csb_entries(const u64 *first, const u64 *last)
> >   *     bits 47-57: sw context id of the lrc the GT switched away from
> >   *     bits 58-63: sw counter of the lrc the GT switched away from
> >   */
> > -static inline bool gen12_csb_parse(const u64 *csb)
> > +static inline bool gen12_csb_parse(const u64 csb)
> >  {
> > -     bool ctx_away_valid;
> > -     bool new_queue;
> > -     u64 entry;
> > -
> > -     /* XXX HSD */
> > -     entry = READ_ONCE(*csb);
> > -     if (unlikely(entry == -1)) {
> > -             preempt_disable();
> > -             if (wait_for_atomic_us((entry = READ_ONCE(*csb)) != -1, 50))
> 
> If we get this deep into desperation, should we start to apply more
> pressure. Ie, rmb instead of just instructing the compiler. And could also
> start to invalidate the entry which obviously if of no use.

I had a rmb() here; removing it did not appear to make any difference
whatsoever to the average delay. The extreme case would be a full
mb(); clflush(); mb() read. I haven't timed the average for that....

> It could even be that the invalidate pays out as the correct value
> bubbles throught hierarchy faster?

I had the same thought... But atm my feeling is the issue is not on the
CPU side (or at least controllable from our code on the CPU).
-Chris