[Mesa-dev] [PATCH 2/4] i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear
Dylan Baker
dylan at pnwbakers.com
Thu Sep 13 15:52:07 UTC 2018
Quoting D Scott Phillips (2018-09-13 08:28:29)
> Tapani Pälli <tapani.palli at intel.com> writes:
>
> > From: D Scott Phillips <d.scott.phillips at intel.com>
> >
> > The reference for MOVNTDQA says:
> >
> > For WC memory type, the nontemporal hint may be implemented by
> > loading a temporary internal buffer with the equivalent of an
> > aligned cache line without filling this data to the cache.
> > [...] Subsequent MOVNTDQA reads to unread portions of the WC
> > cache line will receive data from the temporary internal
> > buffer if data is available.
> >
> > This hidden cache line sized temporary buffer can improve the
> > read performance from wc maps.
> >
> > v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
> > v3: add Android build support (Tapani)
> >
> > Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Reviewed-by: Matt Turner <mattst88 at gmail.com>
> > Acked-by: Kenneth Graunke <kenneth at whitecape.org>
> > ---
> > src/mesa/drivers/dri/i965/Android.mk | 22 +++++++++
> > src/mesa/drivers/dri/i965/Makefile.am | 7 +++
> > src/mesa/drivers/dri/i965/Makefile.sources | 6 ++-
> > src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 62 ++++++++++++++++++++++++++
> > src/mesa/drivers/dri/i965/meson.build | 18 ++++++--
> > 5 files changed, 110 insertions(+), 5 deletions(-)
> >
>
> .. snip ..
>
> > diff --git a/src/mesa/drivers/dri/i965/Makefile.am b/src/mesa/drivers/dri/i965/Makefile.am
> > index 0afa7a2f216..d9e06930d38 100644
> > --- a/src/mesa/drivers/dri/i965/Makefile.am
> > +++ b/src/mesa/drivers/dri/i965/Makefile.am
> > @@ -92,8 +92,14 @@ libi965_gen11_la_CFLAGS = $(AM_CFLAGS) -DGEN_VERSIONx10=110
> >
> > noinst_LTLIBRARIES = \
> > libi965_dri.la \
> > + libintel_tiled_memcpy.la \
> > $(I965_PERGEN_LIBS)
> >
> > +libintel_tiled_memcpy_la_SOURCES = \
> > + $(intel_tiled_memcpy_FILES)
> > +libintel_tiled_memcpy_la_CFLAGS = \
> > + $(AM_CFLAGS) $(SSE41_CFLAGS)
> > +
>
> The issue here is that SSE41_CFLAGS includes -msse4.1, which (1) allows
> us to use sse4.1 intrinsics and (2) allows the compiler to use sse4.1
> instructions in whatever way it wants.
>
> 1 is the desired behavior here and 2 is an unfortunate side-effect. The
> intrinsics we use are properly guarded by runtime checks so they are
> only exercised on systems with support. The other uses of sse4.1 by the
> compiler outside the intrinsics is unpredictable. When I made this
> change there actually were none, which is why everything worked. But the
> compiler has permission to change its mind at any point later.
>
> The sse4.1 code needs isolated from everything else somehow, either
> split into separate files or compile the same file multiple times with
> different flags, or something.
The way we use sse4.1 in mesa core (the only place I know of that we use it) is
to compile a separate static library using the sse4.1 flags.
Dylan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: signature
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180913/0f0ec690/attachment.sig>
More information about the mesa-dev
mailing list