[Mesa-dev] [PATCH] i965: Sort array elements to increase chances of reusing buffer relocation

Tue Dec 2 10:39:18 PST 2014

On 12/02/2014 08:17 AM, Neil Roberts wrote:
> Ok, I've written a somewhat contrived test case here:
> 
> https://github.com/bpeel/glthing/tree/time-attribs
> 
> (Make sure to use the time-attribs branch)
> 
> The example draws a 1000 single-pixel points each with a separate draw
> call. Each call uses a separate but identical VAO so that the driver
> will be be forced to emit the vertices state for each point. Each vertex
> uses the maximum number of vertex attributes as returned by
> GL_MAX_VERTEX_ATTRIBS. All of the attributes are used to determine a
> color value in the vertex shader. Normally it will order the attributes
> in memory so that the first one is in generic attribute 0, the second in
> 1 and so on. However if you pass the option ‘backwards’ on the command
> line it will put them in reverse order. With git master, if the
> attributes are given in order then it will generate a
> 3DSTATE_VERTEX_BUFFERS command with a single buffer and a single
> relocation otherwise it will generate one for each attribute.
> 
> I ran the test with each of these three versions of Mesa and noted the
> FPS. This is based on top of commit 29c7cf2b2 with -O3 and
> --disable-debug. libdrm is 00847fa4 with -O3.
> 
> 1) Mesa master
> 
> 2) Master with my patch applied
> 
> 3) The original optimization removed completely so that it will always
>    generate a buffer relocation for every attribute.
> 
> The test was run with LIBGL_SHOW_FPS=1 vblank_mode=0 on my Haswell
> laptop. The results are:
> 
> attributes are  │  master      with patch      optimization removed
> ────────────────┼──────────────────────────────────────────────────
> in order        │   820           560                  325
> out of order    │   325           540                  325

Also... what is the affect of the optimization when the relocations
cannot be merged?  It should be easy enough to modify the test to get
that data as well.

> The FPS fluctuated by around 20 FPS either side so I've just noted down
> what looked like an approximate representation.
> 
> So I guess the results are that yes, in this extreme case having more
> relocations makes a big difference but also doing the qsort is quite
> expensive. The original optimization does seem worth doing.
> 
> It might be worth making a simpler hard-coded implementation of
> quicksort because calling qsort is probably not very sensible for such a
> small array and the function call overhead for each comparison is
> probably quite a bit.
> 
> It would also probably be good to see if this difference is noticeable
> in a real use case.
> 
> - Neil
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>