[Intel-gfx] intel_gpu_top decode..

Eric Anholt eric at anholt.net
Thu Oct 7 00:27:53 CEST 2010


On Wed, 06 Oct 2010 18:55:55 +0100, Peter Clifton <pcjc2 at cam.ac.uk> wrote:
> Hi,
> 
> Can anyone point me at what this intel_gpu_top output (below) indicates
> regarding what is limiting the frame-rate of my drawing?

I'll take a stab at what I can, but honestly I find the status reports
of the chip fairly mystic myself.

> Primarily I'm throwing a lot of triangles and texture coordinates into a
> vertex array, compiling the lot into a display list and benchmarking the
> frame-rate I can achieve at given window sizes (not vblank limited). For
> the data below, I'm just drawing lines (with two triangles each, then
> two more at each end for caps). I have some colour changes, but these
> are handled with a flush of my vertex array and a glColor call. Colour
> changes should be relatively infrequent though.
> 
> Round line caps are being drawn using two triangles (to make a square),
> with texture coordinates spaced between -1 and 1 to span the square. The
> round object is drawn with an implicit texture using this shader:
> 
> void main()
> {
>   float sqdist;
> 
>   sqdist = dot (gl_TexCoord[0].st, gl_TexCoord[0].st);
>   if (sqdist > 1.0)
>     discard;
> 
>   gl_FragColor = gl_Color;
> }
> 
> The line geometry is also hitting this same shader, but with texture
> coordinates set to 0.0, 0.0 so it is not clipped.
> 
> This is on a GM45. Am I correct in thinking the geometry transfer is the
> indicated bottle-neck? (VF CS is vertex fetch command stream, right?)
> 
> From the fact the pixel shader is at 70%, I presume I'm not (yet)
> fill-rate limited, but not that far from it either.

Generally, a unit appears to also report busy if it's stalled on getting
its data downstream.  So I read your output as probably VF is 9% busy
and 88% waiting for VS, and CL (clipper) is 14% busy, and 68% waiting on
windowizer (fragment shader).  That's just approximately, since we don't
have metrics here for how much is actually stalled vs accomplishing
something.  Ideally, everyone would report busy all the time getting
work done, but another debug tool for Ironlake I'm working on getting
released has strongly pointed to "units are either starved or stalled,
and rarely doing real work."

> I've no idea what the other acronyms are, and the PRM doesn't help
> immediately. Is UC0 related to clipping? Can I reduce it?

Not sure what that one is.

> core clock: 400 Mhz
>                      ring idle:   1%: ▌                                        
>                     ring space: 256/126976 (0%)
>                           task  percent busy
> 
>                          VF CS:  91%: ████████████████████████████████████▌    
>                         UC0 CS:  88%: ███████████████████████████████████▍     
>                         ISC CS:  88%: ███████████████████████████████████▍     
>                          GS CS:  88%: ███████████████████████████████████▍     
>                         VS0 CS:  82%: █████████████████████████████████        
>                          CL CS:  82%: █████████████████████████████████        
>                     MASM CS CR:  80%: ████████████████████████████████▏        
>                    Row 1, EU 3:  78%: ███████████████████████████████▍         
>                    Row 0, EU 3:  71%: ████████████████████████████▌            
>                   Pixel shader:  70%: ████████████████████████████▏            
>                    Bypass FIFO:  69%: ███████████████████████████▊             
>                     Windowizer:  68%: ███████████████████████████▍             
>                    Row 1, EU 2:  63%: █████████████████████████▍               
>                      Filtering:  62%: █████████████████████████                
>                    Row 0, EU 2:  60%: ████████████████████████▏                
>                         URB CS:  57%: ███████████████████████                  
>                   Setup Engine:  55%: ██████████████████████▏                  
>                     Map filter:  54%: █████████████████████▊                   
>                    Row 1, EU 1:  50%: ████████████████████▏                    
>                    Row 0, EU 1:  47%: ███████████████████                      
>             Texture decompress:  45%: ██████████████████▏                      
>                  Sampler cache:  44%: █████████████████▊                       
>                  Texture fetch:  44%: █████████████████▊                       
>                    Row 1, EU 0:  43%: █████████████████▍                       
>             Projection and LOD:  24%: █████████▊                               
>    Dependent address generator:  22%: █████████                                
>                     Dispatcher:  18%: ███████▍                                 
>          Message Arbiter row 1:  11%: ████▌                                    
>                     SVDR CS CR:   6%: ██▌                                      
>                      EM1 CS CR:   5%: ██▏                                      
>                     SVSM CS CR:   2%: █     

Of this, I'd say that you're spending a surprising amount of time in
texture fetch.  Finding ways to reduce texture bandwidth may pay off,
assuming that (texture fetch / sampler cache) is the percentage of the
time you're cache missing.  I'm not sure if that's true or not, though.
And you said that this data was just for the line drawing, which didn't
appear to have any texturing going on at all, so I'm just confused.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20101006/1e077e5f/attachment.sig>


More information about the Intel-gfx mailing list