[Intel-gfx] [PATCH] drm/i915/gt: Schedule request retirement when submission idles

Mon Nov 18 15:18:56 UTC 2019

Quoting Tvrtko Ursulin (2019-11-18 15:12:17)
> 
> On 18/11/2019 14:46, Chris Wilson wrote:
> > The major drawback of commit 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX
> > corruption WA") is that it disables RC6 while Skylake (and friends) is
> > active, and we do not consider the GPU idle until all outstanding
> > requests have been retired and the engine switched over to the kernel
> > context. If userspace is idle, this task falls onto our background idle
> > worker, which only runs roughly once a second, meaning that userspace has
> > to have been idle for a couple of seconds before we enable RC6 again.
> > Naturally, this causes us to consume considerably more energy than
> > before as powersaving is effectively disabled while a display server
> > (here's looking at you Xorg) is running.
> > 
> > As execlists will get a completion event as the last context is
> > completed and the GPU goes idle, we can use our submission tasklet to
> > notice when the GPU is idle and kick the retire worker. Thus during
> > light workloads, we will do much more work to idle the GPU faster...
> > Hopefully with commensurate power saving!
> > 
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=112315
> > References: 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX corruption WA")
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_gt_requests.h |  9 ++++++++-
> >   drivers/gpu/drm/i915/gt/intel_lrc.c         | 11 +++++++++++
> >   2 files changed, 19 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > index fde546424c63..94b8758a45d9 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > @@ -7,7 +7,9 @@
> >   #ifndef INTEL_GT_REQUESTS_H
> >   #define INTEL_GT_REQUESTS_H
> >   
> > -struct intel_gt;
> > +#include <linux/workqueue.h>
> > +
> > +#include "intel_gt_types.h"
> >   
> >   long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
> >   static inline void intel_gt_retire_requests(struct intel_gt *gt)
> > @@ -15,6 +17,11 @@ static inline void intel_gt_retire_requests(struct intel_gt *gt)
> >       intel_gt_retire_requests_timeout(gt, 0);
> >   }
> >   
> > +static inline void intel_gt_schedule_retire_requests(struct intel_gt *gt)
> > +{
> > +     mod_delayed_work(system_wq, &gt->requests.retire_work, 0);
> > +}
> > +
> >   int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> >   
> >   void intel_gt_init_requests(struct intel_gt *gt);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 33ce258d484f..4485fe3e5066 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -142,6 +142,7 @@
> >   #include "intel_engine_pm.h"
> >   #include "intel_gt.h"
> >   #include "intel_gt_pm.h"
> > +#include "intel_gt_requests.h"
> >   #include "intel_lrc_reg.h"
> >   #include "intel_mocs.h"
> >   #include "intel_reset.h"
> > @@ -2278,6 +2279,16 @@ static void execlists_submission_tasklet(unsigned long data)
> >               if (timeout && preempt_timeout(engine))
> >                       preempt_reset(engine);
> >       }
> > +
> > +     /*
> > +      * If the GPU is currently idle, retire the outstanding completed
> > +      * requests. This will allow us to enter soft-rc6 as soon as possible,
> > +      * albeit at the cost of running the retire worker much more frequently
> > +      * (over the entire GT not just this engine) and emitting more idle
> > +      * barriers (i.e. kernel context switches) which may some extra 
> 
> May add, or something like that.
> 
> latency.
> > +      */
> > +     if (!execlists_active(&engine->execlists))
> > +             intel_gt_schedule_retire_requests(engine->gt);
> 
> Looking at "drm/i915/gen8+: Add RC6 CTX corruption WA" it seems it 
> should be possible to only do this if tha RC6 WA is active. So why not 
> limit the impact?

My starting pov is: If the impact is intolerable for one, it is
intolerable for all.

At the moment, is is reasonably consistently triggering

 __i915_request_add_to_timeline:1167 GEM_BUG_ON(timeline->seqno != rq->fence.seqno)

which is a bit of a wtf. Once we get past the pebkac, we can then start
to see if this is noticeable on interesting workloads. Speaking of which
we should ping Dmitry to see if he's noticed a regression in gen9 power
figures.
-Chris