[Intel-gfx] [PATCH i-g-t v6] benchmarks/gem_wsim: Command submission workload simulator

Tue Apr 25 12:25:48 UTC 2017

On Tue, Apr 25, 2017 at 01:10:34PM +0100, Tvrtko Ursulin wrote:
> 
> On 25/04/2017 12:35, Chris Wilson wrote:
> >On Tue, Apr 25, 2017 at 12:13:04PM +0100, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
[snip]
> >>+static enum intel_engine_id
> >>+rt_balance(const struct workload_balancer *balancer,
> >>+	   struct workload *wrk, struct w_step *w)
> >>+{
> >>+	enum intel_engine_id engine;
> >>+	long qd[NUM_ENGINES];
> >>+	unsigned int n;
> >>+
> >>+	igt_assert(w->engine == VCS);
> >>+
> >>+	/* Estimate the "speed" of the most recent batch
> >>+	 *    (finish time - submit time)
> >>+	 * and use that as an approximate for the total remaining time for
> >>+	 * all batches on that engine. We try to keep the total remaining
> >>+	 * balanced between the engines.
> >>+	 */
> >
> >Next steps for this would be to move from an instantaneous speed, to an
> >average. I'm thinking something like a exponential decay moving average
> >just to make the estimation more robust.
> 
> Do you think it would be OK to merge these two tools at this point
> and continue improving them in place?

Yes. Although there's no excuse no to make this NO_RELOC from the start,
especially if we want to demonstrate how it should be done! Hopefully
attached the delta.

> Your balancer already looks a solid step up from the queue-depth
> one. I checked today myself and, what looks like a worst case of a
> VCS1 hog and a balancing workloads running together, it gets the
> VCS2 utilisation to impressive 85%.

Yup. Just thinking of the danger in using an instantaneous value as our
estimate and how to improve.

> As mentioned before those stats can now be collected easily with:
> 
>   trace.pl --trace gem_wsim ...; perf script | trace.pl
> 
> I need to start pining the relevant people for help with creating
> relevant workloads and am also entertaining the idea of trying
> balancing via exporting the stats from i915 directly. Just to see if
> true vs estimated numbers would make a difference here.

Aye.

> >>+			if (qd_throttle > 0 && balancer && balancer->get_qd) {
> >>+				unsigned int target;
> >>+
> >>+				for (target = wrk->nr_steps - 1; target > 0;
> >>+				     target--) {
> >
> >I think this should skip other engines.
> >
> >if (target->engine != engine)
> >	continue;
> 
> If you say so. I don't have an opinion on it. Would it be useful to
> perhaps have both options - to throttle globally and per-engine? I
> could easily add two different workload commands for that.

I'm leaning towards making it a balancer->callback().
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre