[Intel-gfx] Strange performance cliff...

Peter Clifton pcjc2 at cam.ac.uk
Tue Oct 12 19:45:25 CEST 2010


Using glxgears as a tool to exercise the GPU with some simple rendering,
I have noted a strange cliff in the intel_gpu_top output when resizing
the glxgears window:

Below a certain size e.g.:

  -geometry 576x868+0+29


core clock: 400 Mhz
                   render busy:  21%: ████▎                                  render space: 10/126976 (0%)
                bitstream busy:   0%:                                     bitstream space: 0/126976 (0%)

                          task  percent busy
                            CS:  20%: ████                    vert fetch: 35071068 (137619/sec)
               RC Render cache:  17%: ███▌                    prim fetch: 15422476 (60993/sec)
                            VF:  13%: ██▋                  VS invocations: 31634160 (128160/sec)
                            GS:  12%: ██▌                  GS invocations: 11848000 (48000/sec)
                   Windower IZ:  12%: ██▌                       GS prims: 0 (0/sec)
              SF (strip / fan):  12%: ██▌                  CL invocations: 0 (0/sec)
                   Row 1, EU 3:  10%: ██                        CL prims: 28373497 (108993/sec)
                   Row 0, EU 3:  10%: ██                   PS invocations: 7810009 (600886/sec)
                   Bypass FIFO:  10%: ██                   PS depth pass: 17981565671 (73371256/sec)
                  Pixel shader:  10%: ██                   
             Windower / Masker:   9%: █▉                   
                     Filtering:   9%: █▉                   
                   Row 1, EU 2:   9%: █▉                   
                   Row 0, EU 2:   9%: █▉                   
                  Setup Engine:   8%: █▋                   
                          MASF:   8%: █▋                   
                   Row 1, EU 1:   8%: █▋                   
                   Row 0, EU 1:   7%: █▌                   
                   Row 1, EU 0:   7%: █▌                   
                    Map filter:   6%: █▎                   
                           DAP:   6%: █▎                   
            Texture decompress:   6%: █▎                   
                 Sampler cache:   6%: █▎                   
                 Texture fetch:   5%: █                    
                          SVRW:   5%: █                    
                          SVRR:   4%: ▉                    
                           URB:   4%: ▉                    
            Projection and LOD:   3%: ▋                    
   Dependent address generator:   3%: ▋                    
                    Dispatcher:   2%: ▌                    
                  CL (clipper):   2%: ▌                    
                          SVDW:   2%: ▌                    
                           VS0:   2%: ▌                    
                           ISC:   1%: ▎                    
         Message Arbiter row 1:   1%: ▎                    
                          MASM:   1%: ▎                    
      SI (system instruction?):   0%:                      
                            DM:   0%:                      
                            SC:   0%:   


I get this trace.

When I increase the window size just a fraction, to:

  -geometry 576x871+0+29



The CS (command streamer) unit jumps to 100% busy, along with the render
busy graph. Does anyone have any ideas why?

core clock: 400 Mhz
                   render busy: 100%: ████████████████████                   render space: 61/126976 (0%)
                bitstream busy:   1%: ▎                                   bitstream space: 0/126976 (0%)

                          task  percent busy
                            CS: 100%: ████████████████████    vert fetch: 15165204 (133386/sec)
               RC Render cache:  18%: ███▋                    prim fetch: 6654764 (59078/sec)
                            VF:  12%: ██▌                  VS invocations: 13559328 (123888/sec)
                   Windower IZ:  12%: ██▌                  GS invocations: 5078400 (46400/sec)
                            GS:  11%: ██▎                       GS prims: 0 (0/sec)
              SF (strip / fan):  11%: ██▎                  CL invocations: 0 (0/sec)
                   Row 1, EU 3:   9%: █▉                        CL prims: 12836185 (105478/sec)
                  Pixel shader:   9%: █▉                   PS invocations: 6019756 (-1805377/sec)
                   Bypass FIFO:   9%: █▉                   PS depth pass: 7529067710 (71131031/sec)
                   Row 0, EU 3:   9%: █▉                   
             Windower / Masker:   9%: █▉                   
                   Row 1, EU 2:   8%: █▋                   
                     Filtering:   8%: █▋                   
                   Row 0, EU 2:   8%: █▋                   
                          MASF:   8%: █▋                   
                  Setup Engine:   7%: █▌                   
                   Row 1, EU 1:   7%: █▌                   
                   Row 0, EU 1:   7%: █▌                   
                           DAP:   6%: █▎                   
                   Row 1, EU 0:   6%: █▎                   
                    Map filter:   6%: █▎                   
            Texture decompress:   5%: █                    
                 Sampler cache:   5%: █                    
                 Texture fetch:   5%: █                    
                          SVRW:   4%: ▉                    
                          SVRR:   4%: ▉                    
                           URB:   3%: ▋                    
            Projection and LOD:   3%: ▋                    
   Dependent address generator:   2%: ▌                    
                    Dispatcher:   2%: ▌                    
                          SVDW:   2%: ▌                    
                  CL (clipper):   2%: ▌                    
                           ISC:   2%: ▌                    
                           VS0:   1%: ▎                    
         Message Arbiter row 1:   1%: ▎                    
                          MASM:   1%: ▎                    
      SI (system instruction?):   0%:                      
                            DM:   0%:                      
                            SC:   0%:          


NB: I've patched intel_gpu_top to add a little more human readability to
the output. In case I got it wrong, note that these are the changes I
applied:


diff --git a/lib/instdone.c b/lib/instdone.c
index 722fb03..f908a79 100644
--- a/lib/instdone.c
+++ b/lib/instdone.c
@@ -100,7 +100,7 @@ init_g965_instdone1(void)
 static void
 init_g4x_instdone1(void)
 {
-       gen4_instdone1_bit(G4X_BCS_DONE, "BCS");
+       gen4_instdone1_bit(G4X_BCS_DONE, "AVC_FE Command Streamer");
        gen4_instdone1_bit(G4X_CS_DONE, "CS");
        gen4_instdone1_bit(G4X_MASF_DONE, "MASF");
        gen4_instdone1_bit(G4X_SVDW_DONE, "SVDW");
@@ -108,11 +108,11 @@ init_g4x_instdone1(void)
        gen4_instdone1_bit(G4X_SVRW_DONE, "SVRW");
        gen4_instdone1_bit(G4X_SVRR_DONE, "SVRR");
        gen4_instdone1_bit(G4X_ISC_DONE, "ISC");
-       gen4_instdone1_bit(G4X_MT_DONE, "MT");
-       gen4_instdone1_bit(G4X_RC_DONE, "RC");
+       gen4_instdone1_bit(G4X_MT_DONE, "MT Texture cache");
+       gen4_instdone1_bit(G4X_RC_DONE, "RC Render cache");
        gen4_instdone1_bit(G4X_DAP_DONE, "DAP");
        gen4_instdone1_bit(G4X_MAWB_DONE, "MAWB");
-       gen4_instdone1_bit(G4X_MT_IDLE, "MT idle");
+       gen4_instdone1_bit(G4X_MT_IDLE, "MT (texture cache) idle");
        //gen4_instdone1_bit(G4X_GBLT_BUSY, "GBLT");
        gen4_instdone1_bit(G4X_SVSM_DONE, "SVSM");
        gen4_instdone1_bit(G4X_MASM_DONE, "MASM");
@@ -122,13 +122,13 @@ init_g4x_instdone1(void)
        gen4_instdone1_bit(G4X_DM_DONE, "DM");
        gen4_instdone1_bit(G4X_FT_DONE, "FT");
        gen4_instdone1_bit(G4X_DG_DONE, "DG");
-       gen4_instdone1_bit(G4X_SI_DONE, "SI");
+       gen4_instdone1_bit(G4X_SI_DONE, "SI (system instruction?)");
        gen4_instdone1_bit(G4X_SO_DONE, "SO");
        gen4_instdone1_bit(G4X_PL_DONE, "PL");
-       gen4_instdone1_bit(G4X_WIZ_DONE, "WIZ");
+       gen4_instdone1_bit(G4X_WIZ_DONE, "Windower IZ");
        gen4_instdone1_bit(G4X_URB_DONE, "URB");
-       gen4_instdone1_bit(G4X_SF_DONE, "SF");
-       gen4_instdone1_bit(G4X_CL_DONE, "CL");
+       gen4_instdone1_bit(G4X_SF_DONE, "SF (strip / fan)");
+       gen4_instdone1_bit(G4X_CL_DONE, "CL (clipper)");
        gen4_instdone1_bit(G4X_GS_DONE, "GS");
        gen4_instdone1_bit(G4X_VS0_DONE, "VS0");
        gen4_instdone1_bit(G4X_VF_DONE, "VF");
@@ -250,7 +250,7 @@ init_instdone_definitions(uint32_t devid)
                gen4_instdone_bit(I965_ROW_1_EU_3_DONE, "Row 1, EU 3");
                gen4_instdone_bit(I965_SF_DONE, "Strips and Fans");
                gen4_instdone_bit(I965_SE_DONE, "Setup Engine");
-               gen4_instdone_bit(I965_WM_DONE, "Windowizer");
+               gen4_instdone_bit(I965_WM_DONE, "Windower / Masker");
                gen4_instdone_bit(I965_DISPATCHER_DONE, "Dispatcher");
                gen4_instdone_bit(I965_PROJECTION_DONE, "Projection and LOD");
                gen4_instdone_bit(I965_DG_DONE, "Dependent address generator");


Best wishes,

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)




More information about the Intel-gfx mailing list