[Intel-gfx] [PATCH i-g-t v6] benchmarks/gem_wsim: Command submission workload simulator

Tue Apr 25 11:35:34 UTC 2017

On Tue, Apr 25, 2017 at 12:13:04PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> Tool which emits batch buffers to engines with configurable
> sequences, durations, contexts, dependencies and userspace waits.
> 
> Unfinished but shows promise so sending out for early feedback.
> 
> v2:
>  * Load workload descriptors from files. (also -w)
>  * Help text.
>  * Calibration control if needed. (-t)
>  * NORELOC | LUT to eb flags.
>  * Added sample workload to wsim/workload1.
> 
> v3:
>  * Multiple parallel different workloads (-w -w ...).
>  * Multi-context workloads.
>  * Variable (random) batch length.
>  * Load balancing (round robin and queue depth estimation).
>  * Workloads delays and explicit sync steps.
>  * Workload frequency (period) control.
> 
> v4:
>  * Fixed queue-depth estimation by creating separate batches
>    per engine when qd load balancing is on.
>  * Dropped separate -s cmd line option. It can turn itself on
>    automatically when needed.
>  * Keep a single status page and lie about the write hazard
>    as suggested by Chris.
>  * Use batch_start_offset for controlling the batch duration.
>    (Chris)
>  * Set status page object cache level. (Chris)
>  * Moved workload description to a README.
>  * Tidied example workloads.
>  * Some other cleanups and refactorings.
> 
> v5:
>  * Master and background workloads (-W / -w).
>  * Single batch per step is enough even when balancing. (Chris)
>  * Use hars_petruska_f54_1_random IGT functions and see to zero
>    at start. (Chris)
>  * Use WC cache domain when WC mapping. (Chris)
>  * Keep seqnos 64-bytes apart in the status page. (Chris)
>  * Add workload throttling and queue-depth throttling commands.
>    (Chris)
> 
> v6:
>  * Added two more workloads.
>  * Merged RT balancer from Chris.
> 
> TODO list:

* No reloc!
* bb caching/reuse

>  * Fence support.
>  * Better error handling.
>  * Less 1980's workload parsing.
>  * More workloads.
>  * Threads?
>  * ... ?
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin at intel.com>
> ---

> +static enum intel_engine_id
> +rt_balance(const struct workload_balancer *balancer,
> +	   struct workload *wrk, struct w_step *w)
> +{
> +	enum intel_engine_id engine;
> +	long qd[NUM_ENGINES];
> +	unsigned int n;
> +
> +	igt_assert(w->engine == VCS);
> +
> +	/* Estimate the "speed" of the most recent batch
> +	 *    (finish time - submit time)
> +	 * and use that as an approximate for the total remaining time for
> +	 * all batches on that engine. We try to keep the total remaining
> +	 * balanced between the engines.
> +	 */

Next steps for this would be to move from an instantaneous speed, to an
average. I'm thinking something like a exponential decay moving average
just to make the estimation more robust.

> +			if (qd_throttle > 0 && balancer && balancer->get_qd) {
> +				unsigned int target;
> +
> +				for (target = wrk->nr_steps - 1; target > 0;
> +				     target--) {

I think this should skip other engines.

if (target->engine != engine)
	continue;

> +					if (balancer->get_qd(balancer, wrk,
> +							     engine) <
> +					    qd_throttle)
> +						break;
> +					w_sync_to(wrk, w, i - target);
> +				}

-- 
Chris Wilson, Intel Open Source Technology Centre