[Intel-gfx] [PATCH i-g-t v6] benchmarks/gem_wsim: Command submission workload simulator
Chris Wilson
chris at chris-wilson.co.uk
Tue Apr 25 12:25:48 UTC 2017
On Tue, Apr 25, 2017 at 01:10:34PM +0100, Tvrtko Ursulin wrote:
>
> On 25/04/2017 12:35, Chris Wilson wrote:
> >On Tue, Apr 25, 2017 at 12:13:04PM +0100, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
[snip]
> >>+static enum intel_engine_id
> >>+rt_balance(const struct workload_balancer *balancer,
> >>+ struct workload *wrk, struct w_step *w)
> >>+{
> >>+ enum intel_engine_id engine;
> >>+ long qd[NUM_ENGINES];
> >>+ unsigned int n;
> >>+
> >>+ igt_assert(w->engine == VCS);
> >>+
> >>+ /* Estimate the "speed" of the most recent batch
> >>+ * (finish time - submit time)
> >>+ * and use that as an approximate for the total remaining time for
> >>+ * all batches on that engine. We try to keep the total remaining
> >>+ * balanced between the engines.
> >>+ */
> >
> >Next steps for this would be to move from an instantaneous speed, to an
> >average. I'm thinking something like a exponential decay moving average
> >just to make the estimation more robust.
>
> Do you think it would be OK to merge these two tools at this point
> and continue improving them in place?
Yes. Although there's no excuse no to make this NO_RELOC from the start,
especially if we want to demonstrate how it should be done! Hopefully
attached the delta.
> Your balancer already looks a solid step up from the queue-depth
> one. I checked today myself and, what looks like a worst case of a
> VCS1 hog and a balancing workloads running together, it gets the
> VCS2 utilisation to impressive 85%.
Yup. Just thinking of the danger in using an instantaneous value as our
estimate and how to improve.
> As mentioned before those stats can now be collected easily with:
>
> trace.pl --trace gem_wsim ...; perf script | trace.pl
>
> I need to start pining the relevant people for help with creating
> relevant workloads and am also entertaining the idea of trying
> balancing via exporting the stats from i915 directly. Just to see if
> true vs estimated numbers would make a difference here.
Aye.
> >>+ if (qd_throttle > 0 && balancer && balancer->get_qd) {
> >>+ unsigned int target;
> >>+
> >>+ for (target = wrk->nr_steps - 1; target > 0;
> >>+ target--) {
> >
> >I think this should skip other engines.
> >
> >if (target->engine != engine)
> > continue;
>
> If you say so. I don't have an opinion on it. Would it be useful to
> perhaps have both options - to throttle globally and per-engine? I
> could easily add two different workload commands for that.
I'm leaning towards making it a balancer->callback().
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list