[Intel-gfx] Strange performance cliff...
Peter Clifton
pcjc2 at cam.ac.uk
Tue Oct 12 19:45:25 CEST 2010
Using glxgears as a tool to exercise the GPU with some simple rendering,
I have noted a strange cliff in the intel_gpu_top output when resizing
the glxgears window:
Below a certain size e.g.:
-geometry 576x868+0+29
core clock: 400 Mhz
render busy: 21%: ████▎ render space: 10/126976 (0%)
bitstream busy: 0%: bitstream space: 0/126976 (0%)
task percent busy
CS: 20%: ████ vert fetch: 35071068 (137619/sec)
RC Render cache: 17%: ███▌ prim fetch: 15422476 (60993/sec)
VF: 13%: ██▋ VS invocations: 31634160 (128160/sec)
GS: 12%: ██▌ GS invocations: 11848000 (48000/sec)
Windower IZ: 12%: ██▌ GS prims: 0 (0/sec)
SF (strip / fan): 12%: ██▌ CL invocations: 0 (0/sec)
Row 1, EU 3: 10%: ██ CL prims: 28373497 (108993/sec)
Row 0, EU 3: 10%: ██ PS invocations: 7810009 (600886/sec)
Bypass FIFO: 10%: ██ PS depth pass: 17981565671 (73371256/sec)
Pixel shader: 10%: ██
Windower / Masker: 9%: █▉
Filtering: 9%: █▉
Row 1, EU 2: 9%: █▉
Row 0, EU 2: 9%: █▉
Setup Engine: 8%: █▋
MASF: 8%: █▋
Row 1, EU 1: 8%: █▋
Row 0, EU 1: 7%: █▌
Row 1, EU 0: 7%: █▌
Map filter: 6%: █▎
DAP: 6%: █▎
Texture decompress: 6%: █▎
Sampler cache: 6%: █▎
Texture fetch: 5%: █
SVRW: 5%: █
SVRR: 4%: ▉
URB: 4%: ▉
Projection and LOD: 3%: ▋
Dependent address generator: 3%: ▋
Dispatcher: 2%: ▌
CL (clipper): 2%: ▌
SVDW: 2%: ▌
VS0: 2%: ▌
ISC: 1%: ▎
Message Arbiter row 1: 1%: ▎
MASM: 1%: ▎
SI (system instruction?): 0%:
DM: 0%:
SC: 0%:
I get this trace.
When I increase the window size just a fraction, to:
-geometry 576x871+0+29
The CS (command streamer) unit jumps to 100% busy, along with the render
busy graph. Does anyone have any ideas why?
core clock: 400 Mhz
render busy: 100%: ████████████████████ render space: 61/126976 (0%)
bitstream busy: 1%: ▎ bitstream space: 0/126976 (0%)
task percent busy
CS: 100%: ████████████████████ vert fetch: 15165204 (133386/sec)
RC Render cache: 18%: ███▋ prim fetch: 6654764 (59078/sec)
VF: 12%: ██▌ VS invocations: 13559328 (123888/sec)
Windower IZ: 12%: ██▌ GS invocations: 5078400 (46400/sec)
GS: 11%: ██▎ GS prims: 0 (0/sec)
SF (strip / fan): 11%: ██▎ CL invocations: 0 (0/sec)
Row 1, EU 3: 9%: █▉ CL prims: 12836185 (105478/sec)
Pixel shader: 9%: █▉ PS invocations: 6019756 (-1805377/sec)
Bypass FIFO: 9%: █▉ PS depth pass: 7529067710 (71131031/sec)
Row 0, EU 3: 9%: █▉
Windower / Masker: 9%: █▉
Row 1, EU 2: 8%: █▋
Filtering: 8%: █▋
Row 0, EU 2: 8%: █▋
MASF: 8%: █▋
Setup Engine: 7%: █▌
Row 1, EU 1: 7%: █▌
Row 0, EU 1: 7%: █▌
DAP: 6%: █▎
Row 1, EU 0: 6%: █▎
Map filter: 6%: █▎
Texture decompress: 5%: █
Sampler cache: 5%: █
Texture fetch: 5%: █
SVRW: 4%: ▉
SVRR: 4%: ▉
URB: 3%: ▋
Projection and LOD: 3%: ▋
Dependent address generator: 2%: ▌
Dispatcher: 2%: ▌
SVDW: 2%: ▌
CL (clipper): 2%: ▌
ISC: 2%: ▌
VS0: 1%: ▎
Message Arbiter row 1: 1%: ▎
MASM: 1%: ▎
SI (system instruction?): 0%:
DM: 0%:
SC: 0%:
NB: I've patched intel_gpu_top to add a little more human readability to
the output. In case I got it wrong, note that these are the changes I
applied:
diff --git a/lib/instdone.c b/lib/instdone.c
index 722fb03..f908a79 100644
--- a/lib/instdone.c
+++ b/lib/instdone.c
@@ -100,7 +100,7 @@ init_g965_instdone1(void)
static void
init_g4x_instdone1(void)
{
- gen4_instdone1_bit(G4X_BCS_DONE, "BCS");
+ gen4_instdone1_bit(G4X_BCS_DONE, "AVC_FE Command Streamer");
gen4_instdone1_bit(G4X_CS_DONE, "CS");
gen4_instdone1_bit(G4X_MASF_DONE, "MASF");
gen4_instdone1_bit(G4X_SVDW_DONE, "SVDW");
@@ -108,11 +108,11 @@ init_g4x_instdone1(void)
gen4_instdone1_bit(G4X_SVRW_DONE, "SVRW");
gen4_instdone1_bit(G4X_SVRR_DONE, "SVRR");
gen4_instdone1_bit(G4X_ISC_DONE, "ISC");
- gen4_instdone1_bit(G4X_MT_DONE, "MT");
- gen4_instdone1_bit(G4X_RC_DONE, "RC");
+ gen4_instdone1_bit(G4X_MT_DONE, "MT Texture cache");
+ gen4_instdone1_bit(G4X_RC_DONE, "RC Render cache");
gen4_instdone1_bit(G4X_DAP_DONE, "DAP");
gen4_instdone1_bit(G4X_MAWB_DONE, "MAWB");
- gen4_instdone1_bit(G4X_MT_IDLE, "MT idle");
+ gen4_instdone1_bit(G4X_MT_IDLE, "MT (texture cache) idle");
//gen4_instdone1_bit(G4X_GBLT_BUSY, "GBLT");
gen4_instdone1_bit(G4X_SVSM_DONE, "SVSM");
gen4_instdone1_bit(G4X_MASM_DONE, "MASM");
@@ -122,13 +122,13 @@ init_g4x_instdone1(void)
gen4_instdone1_bit(G4X_DM_DONE, "DM");
gen4_instdone1_bit(G4X_FT_DONE, "FT");
gen4_instdone1_bit(G4X_DG_DONE, "DG");
- gen4_instdone1_bit(G4X_SI_DONE, "SI");
+ gen4_instdone1_bit(G4X_SI_DONE, "SI (system instruction?)");
gen4_instdone1_bit(G4X_SO_DONE, "SO");
gen4_instdone1_bit(G4X_PL_DONE, "PL");
- gen4_instdone1_bit(G4X_WIZ_DONE, "WIZ");
+ gen4_instdone1_bit(G4X_WIZ_DONE, "Windower IZ");
gen4_instdone1_bit(G4X_URB_DONE, "URB");
- gen4_instdone1_bit(G4X_SF_DONE, "SF");
- gen4_instdone1_bit(G4X_CL_DONE, "CL");
+ gen4_instdone1_bit(G4X_SF_DONE, "SF (strip / fan)");
+ gen4_instdone1_bit(G4X_CL_DONE, "CL (clipper)");
gen4_instdone1_bit(G4X_GS_DONE, "GS");
gen4_instdone1_bit(G4X_VS0_DONE, "VS0");
gen4_instdone1_bit(G4X_VF_DONE, "VF");
@@ -250,7 +250,7 @@ init_instdone_definitions(uint32_t devid)
gen4_instdone_bit(I965_ROW_1_EU_3_DONE, "Row 1, EU 3");
gen4_instdone_bit(I965_SF_DONE, "Strips and Fans");
gen4_instdone_bit(I965_SE_DONE, "Setup Engine");
- gen4_instdone_bit(I965_WM_DONE, "Windowizer");
+ gen4_instdone_bit(I965_WM_DONE, "Windower / Masker");
gen4_instdone_bit(I965_DISPATCHER_DONE, "Dispatcher");
gen4_instdone_bit(I965_PROJECTION_DONE, "Projection and LOD");
gen4_instdone_bit(I965_DG_DONE, "Dependent address generator");
Best wishes,
--
Peter Clifton
Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA
Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)
More information about the Intel-gfx
mailing list