[Intel-gfx] cmdparser overhead reduction

Chris Wilson chris at chris-wilson.co.uk
Fri Nov 20 02:55:55 PST 2015


I spent yonks trying to define tests that produce reliable results for
demonstrating the impact of the cmdparser, that don't require inspection
of a perf profile. So far, with any reliability (because gen7 thermal
throttling makes life difficult) I can demonstrate the impact of using
vmap + WC. Improving the hash function still relies on inspecting the
perf profile of real applications (i.e. games) where the easiest metrics
to gather such as frame times are dominated by the render time. Nor do I
have a metric that is sensitive to timing, such as the bug reported in

"libva decoding performance regression with kernel 4.0-rc"
1428627643.3417.22.camel at collabora.com

What I can demonstrate is that eliminating the vmap overhead affects
throughput by about 2x on small batches, and using WC on byt further
improves throughput by about 30%. And from that bug report thread,
applying the patches prevented the missed deadlines.

Despite all of this the cmdparser still imposes severe overhead (e.g.
throughput reduction of 2x on batches).
-Chris



More information about the Intel-gfx mailing list