[Mesa-dev] [PATCH 2/4] i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear

Dylan Baker dylan at pnwbakers.com
Thu Sep 13 15:52:07 UTC 2018


Quoting D Scott Phillips (2018-09-13 08:28:29)
> Tapani Pälli <tapani.palli at intel.com> writes:
> 
> > From: D Scott Phillips <d.scott.phillips at intel.com>
> >
> > The reference for MOVNTDQA says:
> >
> >     For WC memory type, the nontemporal hint may be implemented by
> >     loading a temporary internal buffer with the equivalent of an
> >     aligned cache line without filling this data to the cache.
> >     [...] Subsequent MOVNTDQA reads to unread portions of the WC
> >     cache line will receive data from the temporary internal
> >     buffer if data is available.
> >
> > This hidden cache line sized temporary buffer can improve the
> > read performance from wc maps.
> >
> > v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
> > v3: add Android build support (Tapani)
> >
> > Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Reviewed-by: Matt Turner <mattst88 at gmail.com>
> > Acked-by: Kenneth Graunke <kenneth at whitecape.org>
> > ---
> >  src/mesa/drivers/dri/i965/Android.mk           | 22 +++++++++
> >  src/mesa/drivers/dri/i965/Makefile.am          |  7 +++
> >  src/mesa/drivers/dri/i965/Makefile.sources     |  6 ++-
> >  src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 62 ++++++++++++++++++++++++++
> >  src/mesa/drivers/dri/i965/meson.build          | 18 ++++++--
> >  5 files changed, 110 insertions(+), 5 deletions(-)
> >
> 
> .. snip ..
> 
> > diff --git a/src/mesa/drivers/dri/i965/Makefile.am b/src/mesa/drivers/dri/i965/Makefile.am
> > index 0afa7a2f216..d9e06930d38 100644
> > --- a/src/mesa/drivers/dri/i965/Makefile.am
> > +++ b/src/mesa/drivers/dri/i965/Makefile.am
> > @@ -92,8 +92,14 @@ libi965_gen11_la_CFLAGS = $(AM_CFLAGS) -DGEN_VERSIONx10=110
> >  
> >  noinst_LTLIBRARIES = \
> >       libi965_dri.la \
> > +     libintel_tiled_memcpy.la \
> >       $(I965_PERGEN_LIBS)
> >  
> > +libintel_tiled_memcpy_la_SOURCES = \
> > +     $(intel_tiled_memcpy_FILES)
> > +libintel_tiled_memcpy_la_CFLAGS = \
> > +     $(AM_CFLAGS) $(SSE41_CFLAGS)
> > +
> 
> The issue here is that SSE41_CFLAGS includes -msse4.1, which (1) allows
> us to use sse4.1 intrinsics and (2) allows the compiler to use sse4.1
> instructions in whatever way it wants.
> 
> 1 is the desired behavior here and 2 is an unfortunate side-effect. The
> intrinsics we use are properly guarded by runtime checks so they are
> only exercised on systems with support. The other uses of sse4.1 by the
> compiler outside the intrinsics is unpredictable. When I made this
> change there actually were none, which is why everything worked. But the
> compiler has permission to change its mind at any point later.
> 
> The sse4.1 code needs isolated from everything else somehow, either
> split into separate files or compile the same file multiple times with
> different flags, or something.

The way we use sse4.1 in mesa core (the only place I know of that we use it) is
to compile a separate static library using the sse4.1 flags.

Dylan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: signature
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180913/0f0ec690/attachment.sig>


More information about the mesa-dev mailing list