[Mesa-dev] [PATCH 2/4] i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear

D Scott Phillips d.scott.phillips at intel.com
Thu Sep 13 15:28:29 UTC 2018


Tapani Pälli <tapani.palli at intel.com> writes:

> From: D Scott Phillips <d.scott.phillips at intel.com>
>
> The reference for MOVNTDQA says:
>
>     For WC memory type, the nontemporal hint may be implemented by
>     loading a temporary internal buffer with the equivalent of an
>     aligned cache line without filling this data to the cache.
>     [...] Subsequent MOVNTDQA reads to unread portions of the WC
>     cache line will receive data from the temporary internal
>     buffer if data is available.
>
> This hidden cache line sized temporary buffer can improve the
> read performance from wc maps.
>
> v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
> v3: add Android build support (Tapani)
>
> Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
> Reviewed-by: Matt Turner <mattst88 at gmail.com>
> Acked-by: Kenneth Graunke <kenneth at whitecape.org>
> ---
>  src/mesa/drivers/dri/i965/Android.mk           | 22 +++++++++
>  src/mesa/drivers/dri/i965/Makefile.am          |  7 +++
>  src/mesa/drivers/dri/i965/Makefile.sources     |  6 ++-
>  src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 62 ++++++++++++++++++++++++++
>  src/mesa/drivers/dri/i965/meson.build          | 18 ++++++--
>  5 files changed, 110 insertions(+), 5 deletions(-)
>

.. snip ..

> diff --git a/src/mesa/drivers/dri/i965/Makefile.am b/src/mesa/drivers/dri/i965/Makefile.am
> index 0afa7a2f216..d9e06930d38 100644
> --- a/src/mesa/drivers/dri/i965/Makefile.am
> +++ b/src/mesa/drivers/dri/i965/Makefile.am
> @@ -92,8 +92,14 @@ libi965_gen11_la_CFLAGS = $(AM_CFLAGS) -DGEN_VERSIONx10=110
>  
>  noinst_LTLIBRARIES = \
>  	libi965_dri.la \
> +	libintel_tiled_memcpy.la \
>  	$(I965_PERGEN_LIBS)
>  
> +libintel_tiled_memcpy_la_SOURCES = \
> +	$(intel_tiled_memcpy_FILES)
> +libintel_tiled_memcpy_la_CFLAGS = \
> +	$(AM_CFLAGS) $(SSE41_CFLAGS)
> +

The issue here is that SSE41_CFLAGS includes -msse4.1, which (1) allows
us to use sse4.1 intrinsics and (2) allows the compiler to use sse4.1
instructions in whatever way it wants.

1 is the desired behavior here and 2 is an unfortunate side-effect. The
intrinsics we use are properly guarded by runtime checks so they are
only exercised on systems with support. The other uses of sse4.1 by the
compiler outside the intrinsics is unpredictable. When I made this
change there actually were none, which is why everything worked. But the
compiler has permission to change its mind at any point later.

The sse4.1 code needs isolated from everything else somehow, either
split into separate files or compile the same file multiple times with
different flags, or something.

>  libi965_dri_la_SOURCES = \
>  	$(i965_FILES) \
>  	$(i965_oa_GENERATED_FILES)
> @@ -104,6 +110,7 @@ libi965_dri_la_LIBADD = \
>  	$(top_builddir)/src/intel/compiler/libintel_compiler.la \
>  	$(top_builddir)/src/intel/blorp/libblorp.la \
>  	$(I965_PERGEN_LIBS) \
> +	libintel_tiled_memcpy.la
>  	$(LIBDRM_LIBS)
>  
>  BUILT_SOURCES = $(i965_oa_GENERATED_FILES)


More information about the mesa-dev mailing list