[Intel-gfx] [PATCH 06/19] drm/i915/gt: Schedule request retirement when submission idles
Chris Wilson
chris at chris-wilson.co.uk
Tue Nov 19 16:42:28 UTC 2019
Quoting Tvrtko Ursulin (2019-11-19 16:33:18)
>
> On 19/11/2019 16:20, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-11-19 15:04:46)
> >>
> >> On 18/11/2019 23:02, Chris Wilson wrote:
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> index 33ce258d484f..f7c8fec436a9 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> @@ -142,6 +142,7 @@
> >>> #include "intel_engine_pm.h"
> >>> #include "intel_gt.h"
> >>> #include "intel_gt_pm.h"
> >>> +#include "intel_gt_requests.h"
> >>> #include "intel_lrc_reg.h"
> >>> #include "intel_mocs.h"
> >>> #include "intel_reset.h"
> >>> @@ -2278,6 +2279,18 @@ static void execlists_submission_tasklet(unsigned long data)
> >>> if (timeout && preempt_timeout(engine))
> >>> preempt_reset(engine);
> >>> }
> >>> +
> >>> + /*
> >>> + * If the GPU is currently idle, retire the outstanding completed
> >>> + * requests. This will allow us to enter soft-rc6 as soon as possible,
> >>> + * albeit at the cost of running the retire worker much more frequently
> >>> + * (over the entire GT not just this engine) and emitting more idle
> >>> + * barriers (i.e. kernel context switches unpinning all that went
> >>> + * before) which may add some extra latency.
> >>> + */
> >>> + if (intel_engine_pm_is_awake(engine) &&
> >>> + !execlists_active(&engine->execlists))
> >>> + intel_gt_schedule_retire_requests(engine->gt);
> >>
> >> I am still not a fan of doing this for all platforms.
> >
> > I understand. I think it makes a fair amount of sense to do early
> > retires, and wish to pursue that if I can show there is no harm.
>
> It's also a bit of a layering problem.
Them's fighting words! :)
> >> Its not just the cost of retirement but there is
> >> intel_engine_flush_submission on all engines in there as well which we
> >> cannot avoid triggering from this path.
> >>
> >> Would it be worth experimenting with additional per-engine retire
> >> workers? Most of the code could be shared, just a little bit of
> >> specialization to filter on engine.
> >
> > I haven't sketched out anything more than peeking at the last request on
> > the timeline and doing a rq->engine == engine filter. Walking the global
> > timeline.active_list in that case is also a nuisance.
>
> That together with:
>
> flush_submission(gt, engine ? engine->mask : ALL_ENGINES);
>
> Might be enough? At least to satisfy my concern.
Aye, flushing all other when we know we only care about being idle is
definitely a weak point of the current scheme.
> Apart layering is still bad.. And I'd still limit it to when RC6 WA is
> active unless it can be shown there is no perf/power impact across
> GPU/CPU to do this everywhere.
Bah, keep tuning until it's a win for everyone!
> At which point it becomes easier to just limit it because we have to
> have it there.
>
> I also wonder if the current flush_submission wasn't the reason for
> performance regression you were seeing with this? It makes this tasklet
> wait for all other engines, if they are busy. But not sure.. perhaps it
> is work which would be done anyway.
I haven't finished yet; but the baseline took a big nose dive so it
might be enough to hide a lot of evil.
Too bad I don't have an Icelake with to cross check with an unaffected
platform.
> > There's definitely scope here for us using some more information from
> > process_csb() about which context completed and limit work to that
> > timeline. Hmm, something along those lines maybe...
>
> But you want to retire all timelines which have work on this particular
> physical engine. Otherwise it doesn't get parked, no?
There I was suggesting being even more proactive, and say keeping an
llist of completed timelines. Nothing concrete yet, plenty of existing
races found already that need fixing.
-Chris
More information about the Intel-gfx
mailing list