[Mesa-dev] [PATCH 2/4] i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear
tapani.palli at intel.com
Mon Sep 17 06:14:16 UTC 2018
On 09/13/2018 06:52 PM, Dylan Baker wrote:
> Quoting D Scott Phillips (2018-09-13 08:28:29)
>> Tapani Pälli <tapani.palli at intel.com> writes:
>>> From: D Scott Phillips <d.scott.phillips at intel.com>
>>> The reference for MOVNTDQA says:
>>> For WC memory type, the nontemporal hint may be implemented by
>>> loading a temporary internal buffer with the equivalent of an
>>> aligned cache line without filling this data to the cache.
>>> [...] Subsequent MOVNTDQA reads to unread portions of the WC
>>> cache line will receive data from the temporary internal
>>> buffer if data is available.
>>> This hidden cache line sized temporary buffer can improve the
>>> read performance from wc maps.
>>> v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
>>> v3: add Android build support (Tapani)
>>> Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Reviewed-by: Matt Turner <mattst88 at gmail.com>
>>> Acked-by: Kenneth Graunke <kenneth at whitecape.org>
>>> src/mesa/drivers/dri/i965/Android.mk | 22 +++++++++
>>> src/mesa/drivers/dri/i965/Makefile.am | 7 +++
>>> src/mesa/drivers/dri/i965/Makefile.sources | 6 ++-
>>> src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 62 ++++++++++++++++++++++++++
>>> src/mesa/drivers/dri/i965/meson.build | 18 ++++++--
>>> 5 files changed, 110 insertions(+), 5 deletions(-)
>> .. snip ..
>>> diff --git a/src/mesa/drivers/dri/i965/Makefile.am b/src/mesa/drivers/dri/i965/Makefile.am
>>> index 0afa7a2f216..d9e06930d38 100644
>>> --- a/src/mesa/drivers/dri/i965/Makefile.am
>>> +++ b/src/mesa/drivers/dri/i965/Makefile.am
>>> @@ -92,8 +92,14 @@ libi965_gen11_la_CFLAGS = $(AM_CFLAGS) -DGEN_VERSIONx10=110
>>> noinst_LTLIBRARIES = \
>>> libi965_dri.la \
>>> + libintel_tiled_memcpy.la \
>>> +libintel_tiled_memcpy_la_SOURCES = \
>>> + $(intel_tiled_memcpy_FILES)
>>> +libintel_tiled_memcpy_la_CFLAGS = \
>>> + $(AM_CFLAGS) $(SSE41_CFLAGS)
>> The issue here is that SSE41_CFLAGS includes -msse4.1, which (1) allows
>> us to use sse4.1 intrinsics and (2) allows the compiler to use sse4.1
>> instructions in whatever way it wants.
>> 1 is the desired behavior here and 2 is an unfortunate side-effect. The
>> intrinsics we use are properly guarded by runtime checks so they are
>> only exercised on systems with support. The other uses of sse4.1 by the
>> compiler outside the intrinsics is unpredictable. When I made this
>> change there actually were none, which is why everything worked. But the
>> compiler has permission to change its mind at any point later.
>> The sse4.1 code needs isolated from everything else somehow, either
>> split into separate files or compile the same file multiple times with
>> different flags, or something.
> The way we use sse4.1 in mesa core (the only place I know of that we use it) is
> to compile a separate static library using the sse4.1 flags.
OK, I'll try isolating functions requiring sse41 in to own library and
link against that. TBH I'm not 100% sure how this will work right with
all the flatten + inlining used, the surrounding code needs to be
compiled with sse41 as well to be able to always inline a function that
uses intrinsics. Perhaps we need to have full blown
intel_tiled_memcpy_sse41 and intel_tiled_memcpy and then choose one or
another based on support.
More information about the mesa-dev