[Bug 103804] igt/benchmark/gem_exec_nop does not permit to select execution ring
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Fri Nov 17 22:30:37 UTC 2017
https://bugs.freedesktop.org/show_bug.cgi?id=103804
Bug ID: 103804
Summary: igt/benchmark/gem_exec_nop does not permit to select
execution ring
Product: DRI
Version: unspecified
Hardware: Other
OS: All
Status: NEW
Severity: normal
Priority: medium
Component: IGT
Assignee: dri-devel at lists.freedesktop.org
Reporter: dmitry.v.rogozhkin at intel.com
Looking into the code igt/benchmark/gem_exec_nop should permit to select a RING
to load. However, this feature is not functional. For example, assuming that
i915 PMU patches https://patchwork.freedesktop.org/series/29735/ are applied
for the kernel, try:
# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-busy/
-a ./gem_exec_nop -e rcs
4.433
Performance counter stats for 'system wide':
2,002,891,967 ns i915/rcs0-busy/
280,244 ns i915/vcs0-busy/
118,222 ns i915/vcs1-busy/
361,440 ns i915/vecs0-busy/
365,253 ns i915/bcs0-busy/
3.033127723 seconds time elapsed
# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-busy/
-a ./gem_exec_nop -e vcs
4.531
Performance counter stats for 'system wide':
2,005,028,005 ns i915/rcs0-busy/
304,735 ns i915/vcs0-busy/
100,476 ns i915/vcs1-busy/
348,364 ns i915/vecs0-busy/
383,365 ns i915/bcs0-busy/
3.048972240 seconds time elapsed
# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-busy/
-a ./gem_exec_nop -e bcs
4.548
Performance counter stats for 'system wide':
2,003,302,067 ns i915/rcs0-busy/
229,991 ns i915/vcs0-busy/
50,410 ns i915/vcs1-busy/
249,257 ns i915/vecs0-busy/
267,072 ns i915/bcs0-busy/
3.050740036 seconds time elapsed
# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-busy/
-a ./gem_exec_nop -e vecs
4.547
Performance counter stats for 'system wide':
2,002,918,507 ns i915/rcs0-busy/
251,940 ns i915/vcs0-busy/
134,314 ns i915/vcs1-busy/
345,163 ns i915/vecs0-busy/
366,121 ns i915/bcs0-busy/
3.054508956 seconds time elapsed
# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-busy/
-a ./gem_exec_nop -e all
4.488
Performance counter stats for 'system wide':
2,004,461,103 ns i915/rcs0-busy/
194,267 ns i915/vcs0-busy/
104,581 ns i915/vcs1-busy/
306,019 ns i915/vecs0-busy/
291,113 ns i915/bcs0-busy/
3.061850018 seconds time elapsed
So, you see that the load goes always to rcs0. The reason seems to be the
commit:
commit 05ca171aa9a6902614241f9685de2f62f30126d8
Author: Chris Wilson <chris at chris-wilson.co.uk>
Date: Fri Jun 3 10:43:09 2016 +0100
benchmarks/gem_exec_nop: Extend submission to check write inter-engine sync
Currently, we look at the throughput for submitting a read batch to a
single engine or any. The kernel optimises for this by allowing multiple
engine to read at the same time, but writes are exclusive to a single
engine. So lets try to measure the impact of inserting the barriers
between writes on different engines.
Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
which actually shadowed the RING parameter in the loop function:
static int loop(unsigned ring, int reps, int ncpus, unsigned flags) {
all_nengine = 0;
for (ring = 1; ring < 16; ring++) {
execbuf.flags &= ~ENGINE_FLAGS;
execbuf.flags |= ring;
if (__gem_execbuf(fd, &execbuf) == 0)
all_engines[all_nengine++] = ring;
}
if (ring == -1) {
nengine = all_nengine;
memcpy(engines, all_engines, all_nengine*sizeof(engines[0]));
} else {
nengine = 1;
engines[0] = ring;
}
}
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20171117/fc1a9857/attachment.html>
More information about the dri-devel
mailing list