[Intel-gfx] intel_gpu_top profiles
pcjc2 at cam.ac.uk
Sat Oct 31 11:51:27 PDT 2009
On Fri, 2009-10-30 at 11:25 -0700, Eric Anholt wrote:
> Does this help? Are there parts that are unclear and confusing?
It is helpful, yes..
> > Screen-captures of the profiles available here:
> > http://www2.eng.cam.ac.uk/~pcjc2/geda/intel_gpu_top_profiles/
> > http://www2.eng.cam.ac.uk/~pcjc2/geda/intel_gpu_top_profiles/intel_gpu_load_displaylist_no_compiz_PCB.png
> > http://www2.eng.cam.ac.uk/~pcjc2/geda/intel_gpu_top_profiles/intel_gpu_top_displaylist_compiz_PCB.png
> > http://www2.eng.cam.ac.uk/~pcjc2/geda/intel_gpu_top_profiles/intel_gpu_top_no_displaylist_compiz_PCB.png
> > http://www2.eng.cam.ac.uk/~pcjc2/geda/intel_gpu_top_profiles/intel_gpu_top_no_displaylist_no_compiz_PCB.png
> OK, so this shows that the GPU is almost entirely idle, time to pull out
> perf and see what's going wrong.
For the "no_displaylist" profiles, the GPU is mostly idle because my
geometry generation code isn't able to keep up with the frame-rate.
Unless I miss-read them, the "displaylist" profiles are only leaving
about 2% idle in the ring.
> > The "displaylist" versions are where I get the rendering code to build
> > all its rendering into a single displaylist, then just draw the
> > display-list from the expose event handler. (Prodded repeatedly by our
> > "benchmark" action).
> > The "no_displaylist" versions show what happens with our geometry
> > generation overheads in place - obviously there is room for improvement
> > in that case!
> If you're using VBOs, no_displaylist should beat displaylist.
Just vertex arrays unfortunately.. I've got some test code for using
VBOs, and it barely made a difference.. there are other overheads I need
to fix before that becomes worthwhile. (I'll start by adding a cache for
triangulated polygon geometry).
> Otherwise, what a display list performance win probably means is that
> you're getting VBOs made for you.
In this case, it is just because I'm cutting out a load of CPU bound
> > What is "MASM CS CR" and "CL CS"? Those seem to vary in the compiz /
> > non-compiz case. Is there a guide with these names somewhere (other than
> > the GPU PRM?)
> If we know what the names mean, we should improve the names in the
> tool :) Those came straight out of the docs.
It is interesting to see so many parts of the chip lit-up. From the
docs, I guess the "fixed-function" stuff really just spawns tasks in the
execution units, so I get the feeling there is no cheating way to bypass
any GPU bottle-neck in this case.
Since VF (vertex fetch?) is high on my profile, I'm wondering if I need
to reduce the quantity of primitives entering the GPU. A lot of the
primitives on the circuit boards can be described in about 3 parameters,
yet generate a lot of triangles - is there any way to do that on the
More information about the Intel-gfx