[Mesa-dev] [PATCH] i965: Sort array elements to increase chances of reusing buffer relocation

Neil Roberts neil at linux.intel.com
Tue Dec 2 08:17:35 PST 2014


Ok, I've written a somewhat contrived test case here:

https://github.com/bpeel/glthing/tree/time-attribs

(Make sure to use the time-attribs branch)

The example draws a 1000 single-pixel points each with a separate draw
call. Each call uses a separate but identical VAO so that the driver
will be be forced to emit the vertices state for each point. Each vertex
uses the maximum number of vertex attributes as returned by
GL_MAX_VERTEX_ATTRIBS. All of the attributes are used to determine a
color value in the vertex shader. Normally it will order the attributes
in memory so that the first one is in generic attribute 0, the second in
1 and so on. However if you pass the option ‘backwards’ on the command
line it will put them in reverse order. With git master, if the
attributes are given in order then it will generate a
3DSTATE_VERTEX_BUFFERS command with a single buffer and a single
relocation otherwise it will generate one for each attribute.

I ran the test with each of these three versions of Mesa and noted the
FPS. This is based on top of commit 29c7cf2b2 with -O3 and
--disable-debug. libdrm is 00847fa4 with -O3.

1) Mesa master

2) Master with my patch applied

3) The original optimization removed completely so that it will always
   generate a buffer relocation for every attribute.

The test was run with LIBGL_SHOW_FPS=1 vblank_mode=0 on my Haswell
laptop. The results are:

attributes are  │  master      with patch      optimization removed
────────────────┼──────────────────────────────────────────────────
in order        │   820           560                  325
out of order    │   325           540                  325

The FPS fluctuated by around 20 FPS either side so I've just noted down
what looked like an approximate representation.

So I guess the results are that yes, in this extreme case having more
relocations makes a big difference but also doing the qsort is quite
expensive. The original optimization does seem worth doing.

It might be worth making a simpler hard-coded implementation of
quicksort because calling qsort is probably not very sensible for such a
small array and the function call overhead for each comparison is
probably quite a bit.

It would also probably be good to see if this difference is noticeable
in a real use case.

- Neil


More information about the mesa-dev mailing list