[Mesa-dev] [PATCH RFC] mesa: add SSE optimisation for glDrawElements
brianp at vmware.com
Fri Oct 24 08:12:51 PDT 2014
On 10/24/2014 06:28 AM, Timothy Arceri wrote:
> On Thu, 2014-10-23 at 09:20 -0600, Brian Paul wrote:
>> Nice, just a few comments below.
>> On 10/22/2014 10:02 PM, Timothy Arceri wrote:
>>> Makes use of SSE to speed up compute of min and max elements
>>> Callgrind cpu usage results from pts benchmarks:
>>> Openarena 0.8.8: 3.67% -> 1.03%
>>> UrbanTerror: 2.36% -> 0.81%
>>> Signed-off-by: Timothy Arceri <t_arceri at yahoo.com.au>
>>> src/mesa/Makefile.am | 3 +-
>>> src/mesa/main/sse_minmax.c | 75 +++++++++++++++++++++++++++++++++++++++++++
>>> src/mesa/main/sse_minmax.h | 29 +++++++++++++++++
>>> src/mesa/vbo/vbo_exec_array.c | 8 +++++
>>> 4 files changed, 114 insertions(+), 1 deletion(-)
>>> create mode 100644 src/mesa/main/sse_minmax.c
>>> create mode 100644 src/mesa/main/sse_minmax.h
>>> I almost wasn't going to bother sending this out since it uses SSE4.1
>>> and its recommended to use glDrawRangeElements anyway. But since these games
>>> are still ofter used for benchmarking I thought I'd see if anyone is
>>> interested in this. I only optimised GL_UNSIGNED_INT as that was the
>>> only place these games were hitting but I guess it wouldn't hurt
>>> to optimse the other cases too.
>>> I think it would probably make sense too to just combine
>>> streaming-load-memcpy.c and sse_minmax.c into a single file
>>> something like sse_opt.c for example.
>>> As far a frame rates go I couldn't see any concrete improments
>>> on my Ivybridge machine. Maybe it would help more on a high end radeon
>>> card where cpu apparently is more of a concern??
>>> Finally its seems to run fine but its only the second time I've tried
>>> these type of optimisations so let me know if there are any obvious
>>> mistakes or improvements.
>>> diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
>>> index e71bccb..932db4f 100644
>>> --- a/src/mesa/Makefile.am
>>> +++ b/src/mesa/Makefile.am
>>> @@ -151,7 +151,8 @@ libmesagallium_la_LIBADD = \
>>> libmesa_sse41_la_SOURCES = \
>>> - main/streaming-load-memcpy.c
>>> + main/streaming-load-memcpy.c \
>>> + main/sse_minmax.c
>>> libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
>>> pkgconfigdir = $(libdir)/pkgconfig
>> I think you also need to add the new file to Makefile.sources for SCons
>> (also check Android).
>> Alternately, since the function is pretty small, and I'm not sure if it
>> would be used elsewhere, you might put it in the vbo_exec_array.c file.
> Correct me if I'm wrong but I don't think I can do this or the compiler
> might end up generating SSE4.1 instructions for the other functions in
> vbo_exec_array.c the same would apply with your later suggestion for
> moving the c code into the sse_minmax.c file.
I see, but if we're concerned about that, shouldn't we at least check
for SSE4.1 support at runtime for the new code? "if
(_mesa_x86_cpu_features & X86_FEATURE_SSE4_1)" should do it.
Otherwise, I think we should at least respect the USE_SSE_ASM switch:
#if defined(__SSE4_1__) && defined(USE_SSE_ASM)
More information about the mesa-dev