[Mesa-dev] Possible ideas for optimisations in Mesa

Wed May 13 06:10:13 PDT 2015

On Tue, 2015-05-12 at 23:09 -0700, Ian Romanick wrote:
> On 05/12/2015 03:12 PM, Timothy Arceri wrote:
> > On Sat, 2015-04-18 at 12:26 +0200, Marek Olšák wrote:
> >> On Fri, Apr 17, 2015 at 1:21 PM, Timothy Arceri <t_arceri at yahoo.com.au> wrote:
> >>> Hi all,
> >>>
> >>> Last year I spent a whole bunch of time profiling Mesa looking for areas
> >>> where improvements could be made. Anyway I thought I'd point out a
> >>> couple of things, and see if anyone thinks these are worthwhile
> >>> following up.
> >>>
> >>> 1. While the hash table has been getting a lot of attention lately,
> >>> after running the TF2 benchmark one place that showed up as using more
> >>> cpu  than the hash table was the glsl parser. I guess this can be mostly
> >>> solved once mesa has a disk cache for shaders.
> >>>
> >>> But something I came across at the time was this paper describing
> >>> modifying (with apparently little effort) bison to generate a hardcoded
> >>> parser that 2.5-6.5 times faster will generating a slightly bigger
> >>> binary [1].
> >>>
> >>> The resulting project has been lost in the sands of time unfortunately
> >>> so I couldn't try it out.
> >>>
> >>> 2. On most of the old quake engine benchmarks the Intel driver spends
> >>> between 3-4.5% of its time or 400 million calls to glib since this code
> >>> can't be inlined in this bit of code from copy_array_to_vbo_array():
> >>>
> >>>       while (count--) {
> >>>          memcpy(dst, src, dst_stride);
> >>>          src += src_stride;
> >>>          dst += dst_stride;
> >>>       }
> >>>
> >>> I looked in other drivers but I couldn't see them doing this kind of
> >>> thing. I'd imaging because of its nature this code could be a bottle
> >>> neck. Is there any easy ways to avoid doing this type of copy? Or would
> >>> the only thing possible be to write a complex optimisation?
> >>
> >> Yeah, other drivers don't do this. In Gallium, we don't change the
> >> stride when uploading buffers, so in our case src_stride ==
> >> dst_stride.
> >>
> > 
> > Thanks Marek. Looking at the history of the Intel code in git it seems
> > when the code was first written memcpy() wasn't used and the data was
> > just copied 8-bits at a time. In this case you can see the advantage of
> > doing the copy this way, however with the use of memcpy() there doesn't
> > seem to be much of a difference between the code paths.
> > 
> > Out of interest I implemented my own version of memcpy() that can do the
> > copy's with mismatched strides. I did this by aligning the memory to the
> > 8-bytes, doing some shifts in temporaries if needed and then doing 64bit
> > copy's. 
> > It was made simpler for my test case because the strides were always
> > 12-bytes = dst, 16-bytes = src.
> > In the end my memcpy() used slightly less cpu and could give a
> > measurable boost in frame rate in the UrbanTerror benchmark, although
> > the boost isn't always measurable and is mostly about the same. I
> > suspect the boost only happens when memory isn't aligned to 8-bytes.
> > 
> > On average there seems to be around 150 to 200 of these copy's done each
> > this this loop is hit in UrbanTerror so in theory my memcpy() may be
> > able to be made even faster with SSE using load/store and some
> > shuffling. I did attempt this but haven't got it work yet.
> 
> What kind of system were you measuring on?  You might measure a bigger
> delta on a Bay Trail system, for example.  You might also try locking
> the CPU clock low.

I'm using an Ivy Bridge laptop, to be exact:

Processor: Intel Core i5-3317U @ 2.60GHz (4 Cores)
Graphics: Intel HD 4000 (1050MHz)

I'll try locking the CPU to a lower clock and see what happens.

> 
> I know Eero has some tips for measuring small changes in CPU usage.  It
> can be... annoying. :)
> 
> > In the end I'm not sure if implementing a custom memcpy() is worth all
> > the effort but thought I'd post my findings. My memcpy() code is a bit
> > of a mess at the moment but if anyone is interested I can clean it up
> > and push it to my github repo, just let me know.
> > 
> > Tim 
> > 
> >> Marek
> > 
> > 
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>