[Mesa-dev] [PATCH 2/4] i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear

Tapani Pälli tapani.palli at intel.com
Mon Sep 17 06:14:16 UTC 2018


On 09/13/2018 06:52 PM, Dylan Baker wrote:
> Quoting D Scott Phillips (2018-09-13 08:28:29)
>> Tapani Pälli <tapani.palli at intel.com> writes:
>>
>>> From: D Scott Phillips <d.scott.phillips at intel.com>
>>>
>>> The reference for MOVNTDQA says:
>>>
>>>      For WC memory type, the nontemporal hint may be implemented by
>>>      loading a temporary internal buffer with the equivalent of an
>>>      aligned cache line without filling this data to the cache.
>>>      [...] Subsequent MOVNTDQA reads to unread portions of the WC
>>>      cache line will receive data from the temporary internal
>>>      buffer if data is available.
>>>
>>> This hidden cache line sized temporary buffer can improve the
>>> read performance from wc maps.
>>>
>>> v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
>>> v3: add Android build support (Tapani)
>>>
>>> Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Reviewed-by: Matt Turner <mattst88 at gmail.com>
>>> Acked-by: Kenneth Graunke <kenneth at whitecape.org>
>>> ---
>>>   src/mesa/drivers/dri/i965/Android.mk           | 22 +++++++++
>>>   src/mesa/drivers/dri/i965/Makefile.am          |  7 +++
>>>   src/mesa/drivers/dri/i965/Makefile.sources     |  6 ++-
>>>   src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 62 ++++++++++++++++++++++++++
>>>   src/mesa/drivers/dri/i965/meson.build          | 18 ++++++--
>>>   5 files changed, 110 insertions(+), 5 deletions(-)
>>>
>>
>> .. snip ..
>>
>>> diff --git a/src/mesa/drivers/dri/i965/Makefile.am b/src/mesa/drivers/dri/i965/Makefile.am
>>> index 0afa7a2f216..d9e06930d38 100644
>>> --- a/src/mesa/drivers/dri/i965/Makefile.am
>>> +++ b/src/mesa/drivers/dri/i965/Makefile.am
>>> @@ -92,8 +92,14 @@ libi965_gen11_la_CFLAGS = $(AM_CFLAGS) -DGEN_VERSIONx10=110
>>>   
>>>   noinst_LTLIBRARIES = \
>>>        libi965_dri.la \
>>> +     libintel_tiled_memcpy.la \
>>>        $(I965_PERGEN_LIBS)
>>>   
>>> +libintel_tiled_memcpy_la_SOURCES = \
>>> +     $(intel_tiled_memcpy_FILES)
>>> +libintel_tiled_memcpy_la_CFLAGS = \
>>> +     $(AM_CFLAGS) $(SSE41_CFLAGS)
>>> +
>>
>> The issue here is that SSE41_CFLAGS includes -msse4.1, which (1) allows
>> us to use sse4.1 intrinsics and (2) allows the compiler to use sse4.1
>> instructions in whatever way it wants.
>>
>> 1 is the desired behavior here and 2 is an unfortunate side-effect. The
>> intrinsics we use are properly guarded by runtime checks so they are
>> only exercised on systems with support. The other uses of sse4.1 by the
>> compiler outside the intrinsics is unpredictable. When I made this
>> change there actually were none, which is why everything worked. But the
>> compiler has permission to change its mind at any point later.
>>
>> The sse4.1 code needs isolated from everything else somehow, either
>> split into separate files or compile the same file multiple times with
>> different flags, or something.
> 
> The way we use sse4.1 in mesa core (the only place I know of that we use it) is
> to compile a separate static library using the sse4.1 flags.
> 

OK, I'll try isolating functions requiring sse41 in to own library and 
link against that. TBH I'm not 100% sure how this will work right with 
all the flatten + inlining used, the surrounding code needs to be 
compiled with sse41 as well to be able to always inline a function that 
uses intrinsics. Perhaps we need to have full blown 
intel_tiled_memcpy_sse41 and intel_tiled_memcpy and then choose one or 
another based on support.

// Tapani


More information about the mesa-dev mailing list