[Intel-gfx] intel_gpu_top profiles

Peter Clifton pcjc2 at cam.ac.uk
Sat Oct 31 19:51:27 CET 2009


On Fri, 2009-10-30 at 11:25 -0700, Eric Anholt wrote:

> Does this help?  Are there parts that are unclear and confusing?

It is helpful, yes.. 

> > Screen-captures of the profiles available here:
> > 
> > http://www2.eng.cam.ac.uk/~pcjc2/geda/intel_gpu_top_profiles/
> > http://www2.eng.cam.ac.uk/~pcjc2/geda/intel_gpu_top_profiles/intel_gpu_load_displaylist_no_compiz_PCB.png
> > http://www2.eng.cam.ac.uk/~pcjc2/geda/intel_gpu_top_profiles/intel_gpu_top_displaylist_compiz_PCB.png
> > http://www2.eng.cam.ac.uk/~pcjc2/geda/intel_gpu_top_profiles/intel_gpu_top_no_displaylist_compiz_PCB.png
> > http://www2.eng.cam.ac.uk/~pcjc2/geda/intel_gpu_top_profiles/intel_gpu_top_no_displaylist_no_compiz_PCB.png
> 
> OK, so this shows that the GPU is almost entirely idle, time to pull out
> perf and see what's going wrong.

For the "no_displaylist" profiles, the GPU is mostly idle because my
geometry generation code isn't able to keep up with the frame-rate.

Unless I miss-read them, the "displaylist" profiles are only leaving
about 2% idle in the ring.

> > The "displaylist" versions are where I get the rendering code to build
> > all its rendering into a single displaylist, then just draw the
> > display-list from the expose event handler. (Prodded repeatedly by our
> > "benchmark" action).
> > 
> > The "no_displaylist" versions show what happens with our geometry
> > generation overheads in place - obviously there is room for improvement
> > in that case!
> 
> If you're using VBOs, no_displaylist should beat displaylist.

Just vertex arrays unfortunately.. I've got some test code for using
VBOs, and it barely made a difference.. there are other overheads I need
to fix before that becomes worthwhile. (I'll start by adding a cache for
triangulated polygon geometry).

> Otherwise, what a display list performance win probably means is that
> you're getting VBOs made for you.

In this case, it is just because I'm cutting out a load of CPU bound
computation.

> > What is "MASM CS CR" and "CL CS"? Those seem to vary in the compiz /
> > non-compiz case. Is there a guide with these names somewhere (other than
> > the GPU PRM?)
> 
> If we know what the names mean, we should improve the names in the
> tool :)  Those came straight out of the docs.

It is interesting to see so many parts of the chip lit-up. From the
docs, I guess the "fixed-function" stuff really just spawns tasks in the
execution units, so I get the feeling there is no cheating way to bypass
any GPU bottle-neck in this case.

Since VF (vertex fetch?) is high on my profile, I'm wondering if I need
to reduce the quantity of primitives entering the GPU. A lot of the
primitives on the circuit boards can be described in about 3 parameters,
yet generate a lot of triangles - is there any way to do that on the
GPU?


Best wishes,

Peter C.




More information about the Intel-gfx mailing list