[Mesa-dev] [PATCH RFC] mesa: add SSE optimisation for glDrawElements

Brian Paul brianp at vmware.com
Fri Oct 24 08:12:51 PDT 2014


On 10/24/2014 06:28 AM, Timothy Arceri wrote:
> On Thu, 2014-10-23 at 09:20 -0600, Brian Paul wrote:
>> Nice, just a few comments below.
>>
>>
>> On 10/22/2014 10:02 PM, Timothy Arceri wrote:
>>> Makes use of SSE to speed up compute of min and max elements
>>>
>>> Callgrind cpu usage results from pts benchmarks:
>>>
>>> Openarena 0.8.8: 3.67% -> 1.03%
>>> UrbanTerror: 2.36% -> 0.81%
>>>
>>> Signed-off-by: Timothy Arceri <t_arceri at yahoo.com.au>
>>> ---
>>>    src/mesa/Makefile.am          |  3 +-
>>>    src/mesa/main/sse_minmax.c    | 75 +++++++++++++++++++++++++++++++++++++++++++
>>>    src/mesa/main/sse_minmax.h    | 29 +++++++++++++++++
>>>    src/mesa/vbo/vbo_exec_array.c |  8 +++++
>>>    4 files changed, 114 insertions(+), 1 deletion(-)
>>>    create mode 100644 src/mesa/main/sse_minmax.c
>>>    create mode 100644 src/mesa/main/sse_minmax.h
>>>
>>> I almost wasn't going to bother sending this out since it uses SSE4.1
>>> and its recommended to use glDrawRangeElements anyway. But since these games
>>> are still ofter used for benchmarking I thought I'd see if anyone is
>>> interested in this. I only optimised GL_UNSIGNED_INT as that was the
>>> only place these games were hitting but I guess it wouldn't hurt
>>> to optimse the other cases too.
>>>
>>> I think it would probably make sense too to just combine
>>> streaming-load-memcpy.c and sse_minmax.c into a single file
>>> something like sse_opt.c for example.
>>>
>>> As far a frame rates go I couldn't see any concrete improments
>>> on my Ivybridge machine. Maybe it would help more on a high end radeon
>>> card where cpu apparently is more of a concern??
>>>
>>> Finally its seems to run fine but its only the second time I've tried
>>> these type of optimisations so let me know if there are any obvious
>>> mistakes or improvements.
>>>
>>> diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
>>> index e71bccb..932db4f 100644
>>> --- a/src/mesa/Makefile.am
>>> +++ b/src/mesa/Makefile.am
>>> @@ -151,7 +151,8 @@ libmesagallium_la_LIBADD = \
>>>    	$(ARCH_LIBS)
>>>
>>>    libmesa_sse41_la_SOURCES = \
>>> -	main/streaming-load-memcpy.c
>>> +	main/streaming-load-memcpy.c \
>>> +	main/sse_minmax.c
>>>    libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
>>>
>>>    pkgconfigdir = $(libdir)/pkgconfig
>>
>> I think you also need to add the new file to Makefile.sources for SCons
>> (also check Android).
>>
>> Alternately, since the function is pretty small, and I'm not sure if it
>> would be used elsewhere, you might put it in the vbo_exec_array.c file.
>
> Correct me if I'm wrong but I don't think I can do this or the compiler
> might end up generating SSE4.1 instructions for the other functions in
> vbo_exec_array.c the same would apply with your later suggestion for
> moving the c code into the sse_minmax.c file.

I see, but if we're concerned about that, shouldn't we at least check 
for SSE4.1 support at runtime for the new code?   "if 
(_mesa_x86_cpu_features & X86_FEATURE_SSE4_1)" should do it.

Otherwise, I think we should at least respect the USE_SSE_ASM switch: 
#if defined(__SSE4_1__) && defined(USE_SSE_ASM)

-Brian



More information about the mesa-dev mailing list