[Mesa-dev] [PATCH] i965: Disable L3 cache allocation for external buffers

Ville Syrjälä ville.syrjala at linux.intel.com
Wed Nov 1 21:33:28 UTC 2017


On Wed, Nov 01, 2017 at 04:24:21PM +0100, Daniel Vetter wrote:
> On Tue, Oct 24, 2017 at 05:06:33PM +0100, Chris Wilson wrote:
> > Through the use of mocs, we can define the cache usage for any surface
> > used by the GPU. In particular, we can request that L3 cache be
> > allocated for either a read/write miss so that subsequent reads can be
> > fetched from cache rather than memory. A consequence of this is that if
> > we allocate a L3/LLC cacheline for a read and the object is changed in
> > main memory (e.g. a PCIe write bypassing the CPU) then the next read
> > will be serviced from the stale cache and not from the new data in
> > memory. This is an issue for external PRIME buffers where we may miss
> > the updates entirely if the image is small enough to fit within our
> > cache.
> > 
> > Currently, we have a single bit to mark all external buffers so use that
> > to tell us when it is unsafe to use a cache override in mocs and
> > fallback to the PTE value instead (which should be set to the correct
> > cache level to be coherent amongst all active parties: PRIME, scanout and
> > render). This may be refined in future to limit the override to buffers
> > outside the control of mesa; as buffers being shared between mesa
> > clients should be able to coordinate themselves without resolves.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101691
> > Cc: Kenneth Graunke <kenneth at whitecape.org>
> > Cc: Jason Ekstrand <jason at jlekstrand.net>
> > Cc: Lyude Paul <lyude at redhat.com>
> > Cc: Timo Aalton <tjaalton at ubuntu.com>
> > Cc: Ben Widawsky <ben at bwidawsk.net>
> > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > ---
> >  src/intel/blorp/blorp.c                          |  1 +
> >  src/intel/blorp/blorp.h                          |  1 +
> >  src/intel/blorp/blorp_genX_exec.h                |  2 +-
> >  src/intel/blorp/blorp_priv.h                     |  1 +
> >  src/mesa/drivers/dri/i965/brw_blorp.c            |  1 +
> >  src/mesa/drivers/dri/i965/brw_state.h            |  3 ++-
> >  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 16 +++++++++++-----
> >  7 files changed, 18 insertions(+), 7 deletions(-)
> > 
> > diff --git a/src/intel/blorp/blorp.c b/src/intel/blorp/blorp.c
> > index 7cc6335f2f..459ad66652 100644
> > --- a/src/intel/blorp/blorp.c
> > +++ b/src/intel/blorp/blorp.c
> > @@ -71,6 +71,7 @@ brw_blorp_surface_info_init(struct blorp_context *blorp,
> >                         surf->surf->logical_level0_px.array_len));
> >  
> >     info->enabled = true;
> > +   info->external = surf->external;
> >  
> >     if (format == ISL_FORMAT_UNSUPPORTED)
> >        format = surf->surf->format;
> > diff --git a/src/intel/blorp/blorp.h b/src/intel/blorp/blorp.h
> > index 9716c66302..af056c9d52 100644
> > --- a/src/intel/blorp/blorp.h
> > +++ b/src/intel/blorp/blorp.h
> > @@ -106,6 +106,7 @@ struct blorp_surf
> >     enum isl_aux_usage aux_usage;
> >  
> >     union isl_color_value clear_color;
> > +   bool external;
> >  };
> >  
> >  void
> > diff --git a/src/intel/blorp/blorp_genX_exec.h b/src/intel/blorp/blorp_genX_exec.h
> > index 5389262098..18715788ff 100644
> > --- a/src/intel/blorp/blorp_genX_exec.h
> > +++ b/src/intel/blorp/blorp_genX_exec.h
> > @@ -1328,7 +1328,7 @@ blorp_emit_surface_states(struct blorp_batch *batch,
> >           blorp_emit_surface_state(batch, &params->src,
> >                                    surface_maps[BLORP_TEXTURE_BT_INDEX],
> >                                    surface_offsets[BLORP_TEXTURE_BT_INDEX],
> > -                                  NULL, false);
> > +                                  NULL, params->src.external);
> >        }
> >     }
> >  
> > diff --git a/src/intel/blorp/blorp_priv.h b/src/intel/blorp/blorp_priv.h
> > index c7d5d308da..f841aa7cdc 100644
> > --- a/src/intel/blorp/blorp_priv.h
> > +++ b/src/intel/blorp/blorp_priv.h
> > @@ -47,6 +47,7 @@ enum {
> >  struct brw_blorp_surface_info
> >  {
> >     bool enabled;
> > +   bool external;
> >  
> >     struct isl_surf surf;
> >     struct blorp_address addr;
> > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c b/src/mesa/drivers/dri/i965/brw_blorp.c
> > index ed4f9870f2..563d13a037 100644
> > --- a/src/mesa/drivers/dri/i965/brw_blorp.c
> > +++ b/src/mesa/drivers/dri/i965/brw_blorp.c
> > @@ -160,6 +160,7 @@ blorp_surf_for_miptree(struct brw_context *brw,
> >        .offset = mt->offset,
> >        .reloc_flags = is_render_target ? EXEC_OBJECT_WRITE : 0,
> >     };
> > +   surf->external = mt->bo->external;
> >  
> >     surf->aux_usage = aux_usage;
> >  
> > diff --git a/src/mesa/drivers/dri/i965/brw_state.h b/src/mesa/drivers/dri/i965/brw_state.h
> > index 8db354cf23..01c0cd12cb 100644
> > --- a/src/mesa/drivers/dri/i965/brw_state.h
> > +++ b/src/mesa/drivers/dri/i965/brw_state.h
> > @@ -342,6 +342,7 @@ void gen10_init_atoms(struct brw_context *brw);
> >   * may still respect that.
> >   */
> >  #define GEN7_MOCS_L3                    1
> > +#define GEN7_MOCS_PTE                   0
> 
> I think we want to keep the current bitfield (it obviously works with
> uncached scanout buffers), but with a big comment:
> 
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_state.h b/src/mesa/drivers/dri/i965/brw_state.h
> index 8db354cf232d..4e418f5e8ea5 100644
> --- a/src/mesa/drivers/dri/i965/brw_state.h
> +++ b/src/mesa/drivers/dri/i965/brw_state.h
> @@ -342,6 +342,12 @@ void gen10_init_atoms(struct brw_context *brw);
>   * may still respect that.
>   */
>  #define GEN7_MOCS_L3                    1
> +/* Even when we obey the kernel's cacheability setting from the PTE we still
> + * want to explicitly enable L3 caching. L3$ seems to be automatically disabled
> + * when the PTE request uncached mode - the kernel never explicitly flushed L3
> + * on gen7.
> + */

I doubt the the LLC caheability control has any effect on L3 (except on
CHV but that one doesn't even have LLC, and I think they just messed up
the bits for fun). I can't remmeber if I ever properly tested the
L3+LLC=UC combination on other platforms though.

Also there's no way to flush L3. You can flush/invalidate caches that
may or may not be backed by L3. And we do seem to flush/invalidate all
of them. IIRC we were missing the data cache flush bit from the
PIPE_CONTROL at some point, but now it seems to be there. Also I
doubt the data cache would be involved in your typical render
target accesses.

The spec does say that if something gets evicted from L3 it may end up
in LLC even if LLC cacheability is UC. I assume that must be talking
about the data cache since IIRC that is the only RW cache backed by L3,
and I can't see why evicting something from a RO cache would end up
anywhere.

So if we don't have to worry about the eviction issue I think always
using L3 is the correct answer.

-- 
Ville Syrjälä
Intel OTC


More information about the mesa-dev mailing list