[Intel-gfx] [PATCH 08/17] drm/i915/selftests: Add request throughput measurement to perf

Tue Mar 10 12:06:41 UTC 2020

Quoting Tvrtko Ursulin (2020-03-10 11:58:26)
> 
> On 10/03/2020 11:09, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-03-10 10:38:21)
> >>
> >> On 06/03/2020 13:38, Chris Wilson wrote:
> >>> +                     intel_engine_pm_get(engine);
> >>> +
> >>> +                     memset(&engines[idx].p, 0, sizeof(engines[idx].p));
> >>> +                     engines[idx].p.engine = engine;
> >>> +
> >>> +                     engines[idx].tsk = kthread_run(*fn, &engines[idx].p,
> >>> +                                                    "igt:%s", engine->name);
> >>
> >> Test will get affected by the host CPU core count. How about we only
> >> measure num_cpu engines? Might be even more important with discrete.
> > 
> > No. We want to be able to fill the GPU with the different processors.
> > Comparing glk to kbl helps highlight any inefficiencies we have -- we
> > have to be efficient enough that core count is simply not a critical
> > factor to offset our submission overhead.
> > 
> > So we can run the same test and see how it scaled with engines vs cpus
> > just by running it on different machines and look for problems.
> 
> Normally you would expect one core per engine is enough to saturate the 
> engine. I am afraid adding more combinations will be confusing when 
> reading test results. (Same GPU, same engine count, different CPU core 
> count.) How about two subtest variants? One is 1:1 CPU core to engine, 
> and another can be all engines like here?

Each machine will have its own consistent configuration. The question I
have in mind is "can we saturate this machine"? This machine remains
constant for all the runs. And our goal is that the driver is not a
bottleneck on any machine.

> Or possibly:
> 
> 1. 1 CPU core - 1 engine - purest latency/overhead
> 2. 1 CPU core - N engines (N = all engines) - more
> 3. N CPU cores - N engines (N = min(engines, cores) - global lock 
> contention, stable setup
> 4. M CPU cores - N engines (N, M = max) - lock contention stress
> 5. N CPU cores - 1 engine (N = all cores) - more extreme lock contention

I hear you in that you would like to have a serial test as well. Where
we just use 1 cpu thread to submit to all engines as fast as we can and
see how close we get with just "1 core". (There will still be
parallelism one hopes from our interrupt handler.)
-Chris