[Intel-gfx] [PATCH i-g-t v6] benchmarks/gem_wsim: Command submission workload simulator
Chris Wilson
chris at chris-wilson.co.uk
Tue Apr 25 11:35:34 UTC 2017
On Tue, Apr 25, 2017 at 12:13:04PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>
> Tool which emits batch buffers to engines with configurable
> sequences, durations, contexts, dependencies and userspace waits.
>
> Unfinished but shows promise so sending out for early feedback.
>
> v2:
> * Load workload descriptors from files. (also -w)
> * Help text.
> * Calibration control if needed. (-t)
> * NORELOC | LUT to eb flags.
> * Added sample workload to wsim/workload1.
>
> v3:
> * Multiple parallel different workloads (-w -w ...).
> * Multi-context workloads.
> * Variable (random) batch length.
> * Load balancing (round robin and queue depth estimation).
> * Workloads delays and explicit sync steps.
> * Workload frequency (period) control.
>
> v4:
> * Fixed queue-depth estimation by creating separate batches
> per engine when qd load balancing is on.
> * Dropped separate -s cmd line option. It can turn itself on
> automatically when needed.
> * Keep a single status page and lie about the write hazard
> as suggested by Chris.
> * Use batch_start_offset for controlling the batch duration.
> (Chris)
> * Set status page object cache level. (Chris)
> * Moved workload description to a README.
> * Tidied example workloads.
> * Some other cleanups and refactorings.
>
> v5:
> * Master and background workloads (-W / -w).
> * Single batch per step is enough even when balancing. (Chris)
> * Use hars_petruska_f54_1_random IGT functions and see to zero
> at start. (Chris)
> * Use WC cache domain when WC mapping. (Chris)
> * Keep seqnos 64-bytes apart in the status page. (Chris)
> * Add workload throttling and queue-depth throttling commands.
> (Chris)
>
> v6:
> * Added two more workloads.
> * Merged RT balancer from Chris.
>
> TODO list:
* No reloc!
* bb caching/reuse
> * Fence support.
> * Better error handling.
> * Less 1980's workload parsing.
> * More workloads.
> * Threads?
> * ... ?
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin at intel.com>
> ---
> +static enum intel_engine_id
> +rt_balance(const struct workload_balancer *balancer,
> + struct workload *wrk, struct w_step *w)
> +{
> + enum intel_engine_id engine;
> + long qd[NUM_ENGINES];
> + unsigned int n;
> +
> + igt_assert(w->engine == VCS);
> +
> + /* Estimate the "speed" of the most recent batch
> + * (finish time - submit time)
> + * and use that as an approximate for the total remaining time for
> + * all batches on that engine. We try to keep the total remaining
> + * balanced between the engines.
> + */
Next steps for this would be to move from an instantaneous speed, to an
average. I'm thinking something like a exponential decay moving average
just to make the estimation more robust.
> + if (qd_throttle > 0 && balancer && balancer->get_qd) {
> + unsigned int target;
> +
> + for (target = wrk->nr_steps - 1; target > 0;
> + target--) {
I think this should skip other engines.
if (target->engine != engine)
continue;
> + if (balancer->get_qd(balancer, wrk,
> + engine) <
> + qd_throttle)
> + break;
> + w_sync_to(wrk, w, i - target);
> + }
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list