[Mesa-dev] [PATCH 7/7] i965: Fix asynchronous mappings on !LLC platforms.

Chris Wilson chris at chris-wilson.co.uk
Fri Jul 7 11:09:11 UTC 2017


Quoting Kenneth Graunke (2017-07-07 07:08:16)
> On Thursday, July 6, 2017 10:51:49 PM PDT Kenneth Graunke wrote:
> > On Wednesday, July 5, 2017 2:24:55 PM PDT Chris Wilson wrote:
> > > Quoting Kenneth Graunke (2017-07-05 21:56:54)
> > > > ---
> > > >  src/mesa/drivers/dri/i965/brw_bufmgr.c | 15 +++++++++++++--
> > > >  1 file changed, 13 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > > > index 7756e2b5f6c..46696be3577 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > > > +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > > > @@ -56,6 +56,7 @@
> > > >  #ifndef ETIME
> > > >  #define ETIME ETIMEDOUT
> > > >  #endif
> > > > +#include "common/gen_clflush.h"
> > > >  #include "common/gen_debug.h"
> > > >  #include "common/gen_device_info.h"
> > > >  #include "libdrm_macros.h"
> > > > @@ -698,12 +699,22 @@ brw_bo_map_cpu(struct brw_context *brw, struct brw_bo *bo, unsigned flags)
> > > >           VG(VALGRIND_FREELIKE_BLOCK(map, 0));
> > > >           drm_munmap(map, bo->size);
> > > >        }
> > > > +   } else if (!bo->cache_coherent) {
> > > > +      /* If we're reusing an existing CPU mapping, the CPU caches may
> > > > +       * contain stale data from the last time we read from that mapping.
> > > > +       * (With the BO cache, it might even be data from a previous buffer!)
> > > > +       *
> > > > +       * We need to invalidate those cachelines so that we see the latest
> > > > +       * contents.
> > > > +       */
> > > > +      gen_invalidate_range(bo->map_cpu, bo->size);
> > > >     }
> > > 
> > > This leaves us in trouble on the first invocation with MAP_ASYNC, where
> > > we create the cpu mmaping but don't change any of its domains. (Not that
> > > GL allows for READ | UNSYNCRONIZED if my reading of the spec was
> > > correct.) However, if you use
> > > 
> > >     if (!(flags & MAP_ASYNC))
> > >             wait_rendering(); /* teach me to use gem_wait! */
> > > 
> > >     if (!bo->cache_coherent)
> > >             gen_invalidate_range();
> > > 
> > > and a set_domain(GTT, 0) on creation that should cover everything.
> > 
> > I think I meant to set_domain(GTT, 0) on creation in this series, and
> > lost that patch somehow.  It seems like a good idea to add that.
> > 
> > You're correct, READ | UNSYNCHRONIZED is not allowed.  Since we disallow
> > CPU maps for writes, we should never see MAP_ASYNC here.  Which means we'll
> > always do gen_invalidate_range() and then set_domain(CPU).
> > 
> > I'm struggling to see how wait_rendering (aka set_domain(GTT)) helps us
> > here...there are no CPU writes to wait for...we disallowed them...
> > 
> > > In the meantime, s/else if (!bo->cache_coherent)/if (!bo->cache_coherent)/
> > > -Chris
> > 
> > Oh?  I can do that.  I figured that when we asked the kernel to create a
> > brand new CPU map for us, it would guarantee that the new virtual address
> > range didn't have any stale data in the CPU caches.  But, if it doesn't,
> > then we definitely need to clflush them out.
> 
> On second thought, that seems pretty scary:
> 
> We never throw away CPU maps...so if we don't have one, this is the first
> time we've CPU mapped the buffer.  We never write via the CPU, either.
> There cannot possibly be valid data in the CPU cache for this buffer.
> 
> So the only data that could possibly be there...is from some other buffer.
> Wouldn't that be an information leak...?

You've forgotten that the kernel wrote to the BO to zero it out (and
previously you may have used pwrite), and you are choosing to bypass the
set_domain that ensured the data was where you expected it. (But that
applies to flushing after write before GTT access, not quite this
situation.)

But in this case, it is not about clflushing the contents of the cache
away, but ensuring that speculative cpu access is invalidated prior to
refreshing the contents from memory. (You cannot prevent bsw+ from
loading stale data into the cpu cache ahead of time and it won't notice
that the memory changes behind the cacheline.) Since you bypass
set_domain, you need to pick up all the pieces ;)
-Chris


More information about the mesa-dev mailing list