[Mesa-dev] [PATCH] i965: Sort array elements to increase chances of reusing buffer relocation

Tue Dec 2 10:27:32 PST 2014

On Tue, Dec 02, 2014 at 04:17:35PM +0000, Neil Roberts wrote:
> Ok, I've written a somewhat contrived test case here:
> 
> https://github.com/bpeel/glthing/tree/time-attribs
> 
> (Make sure to use the time-attribs branch)
> 
> The example draws a 1000 single-pixel points each with a separate draw
> call. Each call uses a separate but identical VAO so that the driver
> will be be forced to emit the vertices state for each point. Each vertex
> uses the maximum number of vertex attributes as returned by
> GL_MAX_VERTEX_ATTRIBS. All of the attributes are used to determine a
> color value in the vertex shader. Normally it will order the attributes
> in memory so that the first one is in generic attribute 0, the second in
> 1 and so on. However if you pass the option ‘backwards’ on the command
> line it will put them in reverse order. With git master, if the
> attributes are given in order then it will generate a
> 3DSTATE_VERTEX_BUFFERS command with a single buffer and a single
> relocation otherwise it will generate one for each attribute.
> 
> I ran the test with each of these three versions of Mesa and noted the
> FPS. This is based on top of commit 29c7cf2b2 with -O3 and
> --disable-debug. libdrm is 00847fa4 with -O3.
> 
> 1) Mesa master
> 
> 2) Master with my patch applied
> 
> 3) The original optimization removed completely so that it will always
>    generate a buffer relocation for every attribute.
> 
> The test was run with LIBGL_SHOW_FPS=1 vblank_mode=0 on my Haswell
> laptop. The results are:
> 
> attributes are  │  master      with patch      optimization removed
> ────────────────┼──────────────────────────────────────────────────
> in order        │   820           560                  325
> out of order    │   325           540                  325
> 
> The FPS fluctuated by around 20 FPS either side so I've just noted down
> what looked like an approximate representation.
> 
> So I guess the results are that yes, in this extreme case having more
> relocations makes a big difference but also doing the qsort is quite
> expensive. The original optimization does seem worth doing.
> 
> It might be worth making a simpler hard-coded implementation of
> quicksort because calling qsort is probably not very sensible for such a
> small array and the function call overhead for each comparison is
> probably quite a bit.
> 
> It would also probably be good to see if this difference is noticeable
> in a real use case.
> 
> - Neil

Cool. My statement was really getting at there normally won't be many duplicated
relocations. What kind of numbers do you see as you scale down the number of
points?