[Mesa-dev] [PATCH 11/16] i965: Replace brw_bo_map_unsynchronized with MAP_ASYNC
Daniel Vetter
daniel at ffwll.ch
Tue May 30 19:57:15 UTC 2017
On Tue, May 30, 2017 at 10:59:19AM -0700, Kenneth Graunke wrote:
> On Wednesday, May 24, 2017 1:04:53 PM PDT Matt Turner wrote:
> > This way we can let brw_bo_map() choose the best mapping type.
>
> Sounds good.
>
> > Part of the patch inlines map_gtt() into brw_bo_map_gtt() (and removes
> > map_gtt()). brw_bo_map_gtt() just wrapped map_gtt() with locking and a
> > call to set_domain(). map_gtt() is called by brw_bo_map_unsynchronized()
> > to avoid the call to set_domain(). With the MAP_ASYNC flag, we now have
> > the same behavior previously provided by brw_bo_map_unsynchronized().
>
> I'm confused about what you mean by "we now have the same behavior"...
>
> On non-LLC platforms, brw_bo_map_unsynchronized(brw, bo) would always
> do a set_domain and stall. Now, brw_bo_map_gtt(brw, bo, MAP_ASYNC)
> will elide the set_domain (while still doing a GTT map). That's a
> substantial change in behavior.
>
> At the end you also change:
> - map = brw_bo_map_unsynchronized(brw, intel_obj->buffer);
> + void *map = brw_bo_map(brw, intel_obj->buffer, access);
>
> which makes glMapBufferRange with GL_MAP_UNSYNCHRONIZED_BIT use the
> can_map_cpu() logic, possibly changing things from a GTT map to a CPU
> mapping. That's also a substantial change in behavior. We also start
> using glMapBufferRange's access modes, rather than MAP_READ | MAP_WRITE.
>
> I think both of those changes are desirable.
>
> Would it make sense to split this up a bit...
>
> 1. Make brw_bo_map_cpu / brw_bo_map_gtt take a MAP_ASYNC flag and
> conditionally call set_domain(). Should have no functional effect
> today because nobody passes MAP_ASYNC to those functions.
>
> 2. Make brw_bo_map_unsynchronized call brw_bo_map_gtt() with the
> MAP_ASYNC bit set if bufmgr->has_llc. Inline map_gtt() into
> brw_bo_map_gtt(). Pure refactor, no functional changes.
>
> 3? Make brw_bo_map_unsynchronized always pass MAP_ASYNC.
> (would make non-LLC platforms start using async GTT maps)
>
> 4? Delete brw_bo_map_unsynchronized and switch to brw_bo_map with the
> access flags, as you do in the last hunk of this patch, which enables
> unsynchronized CPU maps on LLC platforms, or whenever can_map_cpu()
> says it's OK.
>
> Maybe it makes sense to keep 3-4 together. Would be nice to split out
> the map_gtt inlining from the functional changes, at least.
If you want to maximize bisectability here I think you also need to take
into account the previous patch. Switching a few mappings from cpu to gtt
is the required step to make this patch here correct: The cpu mmaps
without a set_domain nor an in-userspace clflush wouldn't be coherent,
instead of just being unsynchronized.
Again this is only relevant for reading really.
-Daniel
>
> > ---
> > src/mesa/drivers/dri/i965/brw_bufmgr.c | 91 ++++--------------------
> > src/mesa/drivers/dri/i965/brw_program_cache.c | 4 +-
> > src/mesa/drivers/dri/i965/intel_buffer_objects.c | 27 +++----
> > 3 files changed, 25 insertions(+), 97 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > index ec9611f..aec07e1 100644
> > --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > @@ -689,8 +689,9 @@ brw_bo_map_cpu(struct brw_context *brw, struct brw_bo *bo, unsigned flags)
> > DBG("brw_bo_map_cpu: %d (%s) -> %p\n", bo->gem_handle, bo->name,
> > bo->map_cpu);
> >
> > - set_domain(brw, "CPU mapping", bo, I915_GEM_DOMAIN_CPU,
> > - flags & MAP_WRITE ? I915_GEM_DOMAIN_CPU : 0);
> > + if (!(flags & MAP_ASYNC))
> > + set_domain(brw, "CPU mapping", bo, I915_GEM_DOMAIN_CPU,
> > + flags & MAP_WRITE ? I915_GEM_DOMAIN_CPU : 0);
>
> Curly braces, please!
>
> >
> > bo_mark_mmaps_incoherent(bo);
> > VG(VALGRIND_MAKE_MEM_DEFINED(bo->map_cpu, bo->size));
> > @@ -700,15 +701,17 @@ brw_bo_map_cpu(struct brw_context *brw, struct brw_bo *bo, unsigned flags)
> > }
> >
> > static void *
> > -map_gtt(struct brw_bo *bo)
> > +brw_bo_map_gtt(struct brw_context *brw, struct brw_bo *bo, unsigned flags)
> > {
> > struct brw_bufmgr *bufmgr = bo->bufmgr;
> >
> > + pthread_mutex_lock(&bufmgr->lock);
> > +
> > /* Get a mapping of the buffer if we haven't before. */
> > if (bo->map_gtt == NULL) {
> > struct drm_i915_gem_mmap_gtt mmap_arg;
> >
> > - DBG("bo_map_gtt: mmap %d (%s), map_count=%d\n",
> > + DBG("brw_bo_map_gtt: mmap %d (%s), map_count=%d\n",
> > bo->gem_handle, bo->name, bo->map_count);
> >
> > memclear(mmap_arg);
> > @@ -719,6 +722,7 @@ map_gtt(struct brw_bo *bo)
> > if (ret != 0) {
> > DBG("%s:%d: Error preparing buffer map %d (%s): %s .\n",
> > __FILE__, __LINE__, bo->gem_handle, bo->name, strerror(errno));
> > + pthread_mutex_unlock(&bufmgr->lock);
> > return NULL;
> > }
> >
> > @@ -729,89 +733,24 @@ map_gtt(struct brw_bo *bo)
> > bo->map_gtt = NULL;
> > DBG("%s:%d: Error mapping buffer %d (%s): %s .\n",
> > __FILE__, __LINE__, bo->gem_handle, bo->name, strerror(errno));
> > + pthread_mutex_unlock(&bufmgr->lock);
> > return NULL;
> > }
> > + bo->map_count++;
>
> This looks like a separate change. I'm not sure how much I care - we
> probably ought to just nuke the map_count stuff altogether, as Chris
> suggested. It looks like it only controls the valgrind annotations.
>
> > }
> >
> > - DBG("bo_map_gtt: %d (%s) -> %p\n", bo->gem_handle, bo->name,
> > + DBG("brw_bo_map_gtt: %d (%s) -> %p\n", bo->gem_handle, bo->name,
> > bo->map_gtt);
> >
> > - bo->map_count++;
> > - return bo->map_gtt;
> > -}
> > -
> > -static void *
> > -brw_bo_map_gtt(struct brw_context *brw, struct brw_bo *bo, unsigned flags)
> > -{
> > - struct brw_bufmgr *bufmgr = bo->bufmgr;
> > -
> > - pthread_mutex_lock(&bufmgr->lock);
> > -
> > - void *map = map_gtt(bo);
> > - if (map == NULL) {
> > - pthread_mutex_unlock(&bufmgr->lock);
> > - return NULL;
> > - }
> > -
> > - /* Now move it to the GTT domain so that the GPU and CPU
> > - * caches are flushed and the GPU isn't actively using the
> > - * buffer.
> > - *
> > - * The pagefault handler does this domain change for us when
> > - * it has unbound the BO from the GTT, but it's up to us to
> > - * tell it when we're about to use things if we had done
> > - * rendering and it still happens to be bound to the GTT.
> > - */
> > - set_domain(brw, "GTT mapping", bo,
> > - I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
> > + if (!(flags & MAP_ASYNC))
> > + set_domain(brw, "GTT mapping", bo,
> > + I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
>
> Curly braces, please!
>
> > bo_mark_mmaps_incoherent(bo);
> > VG(VALGRIND_MAKE_MEM_DEFINED(bo->map_gtt, bo->size));
> > pthread_mutex_unlock(&bufmgr->lock);
> >
> > - return map;
> > -}
> > -
> > -/**
> > - * Performs a mapping of the buffer object like the normal GTT
> > - * mapping, but avoids waiting for the GPU to be done reading from or
> > - * rendering to the buffer.
> > - *
> > - * This is used in the implementation of GL_ARB_map_buffer_range: The
> > - * user asks to create a buffer, then does a mapping, fills some
> > - * space, runs a drawing command, then asks to map it again without
> > - * synchronizing because it guarantees that it won't write over the
> > - * data that the GPU is busy using (or, more specifically, that if it
> > - * does write over the data, it acknowledges that rendering is
> > - * undefined).
> > - */
> > -
> > -void *
> > -brw_bo_map_unsynchronized(struct brw_context *brw, struct brw_bo *bo)
> > -{
> > - struct brw_bufmgr *bufmgr = bo->bufmgr;
> > -
> > - /* If the CPU cache isn't coherent with the GTT, then use a
> > - * regular synchronized mapping. The problem is that we don't
> > - * track where the buffer was last used on the CPU side in
> > - * terms of brw_bo_map_cpu vs brw_bo_map_gtt, so
> > - * we would potentially corrupt the buffer even when the user
> > - * does reasonable things.
> > - */
> > - if (!bufmgr->has_llc)
> > - return brw_bo_map_gtt(brw, bo, MAP_READ | MAP_WRITE);
> > -
> > - pthread_mutex_lock(&bufmgr->lock);
> > -
> > - void *map = map_gtt(bo);
> > - if (map != NULL) {
> > - bo_mark_mmaps_incoherent(bo);
> > - VG(VALGRIND_MAKE_MEM_DEFINED(bo->map_gtt, bo->size));
> > - }
> > -
> > - pthread_mutex_unlock(&bufmgr->lock);
> > -
> > - return map;
> > + return bo->map_gtt;
> > }
> >
> > static bool
> > diff --git a/src/mesa/drivers/dri/i965/brw_program_cache.c b/src/mesa/drivers/dri/i965/brw_program_cache.c
> > index ab03969..2fd989a 100644
> > --- a/src/mesa/drivers/dri/i965/brw_program_cache.c
> > +++ b/src/mesa/drivers/dri/i965/brw_program_cache.c
> > @@ -220,7 +220,7 @@ brw_cache_new_bo(struct brw_cache *cache, uint32_t new_size)
> > if (can_do_exec_capture(brw->screen))
> > new_bo->kflags = EXEC_OBJECT_CAPTURE;
> > if (brw->has_llc)
> > - llc_map = brw_bo_map_unsynchronized(brw, new_bo);
> > + llc_map = brw_bo_map(brw, new_bo, MAP_READ | MAP_ASYNC);
> >
> > /* Copy any existing data that needs to be saved. */
> > if (cache->next_offset != 0) {
> > @@ -417,7 +417,7 @@ brw_init_caches(struct brw_context *brw)
> > if (can_do_exec_capture(brw->screen))
> > cache->bo->kflags = EXEC_OBJECT_CAPTURE;
> > if (brw->has_llc)
> > - cache->map = brw_bo_map_unsynchronized(brw, cache->bo);
> > + cache->map = brw_bo_map(brw, cache->bo, MAP_READ | MAP_WRITE | MAP_ASYNC);
> > }
> >
> > static void
> > diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
> > index 5813989..a9ac29a 100644
> > --- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c
> > +++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
> > @@ -216,17 +216,13 @@ brw_buffer_subdata(struct gl_context *ctx,
> > */
> > if (offset + size <= intel_obj->gpu_active_start ||
> > intel_obj->gpu_active_end <= offset) {
> > - if (brw->has_llc) {
> > - void *map = brw_bo_map_unsynchronized(brw, intel_obj->buffer);
> > - memcpy(map + offset, data, size);
> > - brw_bo_unmap(intel_obj->buffer);
> > + void *map = brw_bo_map(brw, intel_obj->buffer, MAP_WRITE | MAP_ASYNC);
> > + memcpy(map + offset, data, size);
> > + brw_bo_unmap(intel_obj->buffer);
> >
> > - if (intel_obj->gpu_active_end > intel_obj->gpu_active_start)
> > - intel_obj->prefer_stall_to_blit = true;
> > - return;
> > - } else {
> > - perf_debug("BufferSubData could be unsynchronized, but !LLC doesn't support it yet\n");
> > - }
> > + if (intel_obj->gpu_active_end > intel_obj->gpu_active_start)
> > + intel_obj->prefer_stall_to_blit = true;
> > + return;
> > }
> >
> > busy =
> > @@ -400,15 +396,8 @@ brw_map_buffer_range(struct gl_context *ctx,
> > return obj->Mappings[index].Pointer;
> > }
> >
> > - void *map;
> > - if (access & GL_MAP_UNSYNCHRONIZED_BIT) {
> > - if (!brw->has_llc && brw->perf_debug &&
> > - brw_bo_busy(intel_obj->buffer)) {
> > - perf_debug("MapBufferRange with GL_MAP_UNSYNCHRONIZED_BIT stalling (it's actually synchronized on non-LLC platforms)\n");
> > - }
> > - map = brw_bo_map_unsynchronized(brw, intel_obj->buffer);
> > - } else {
> > - map = brw_bo_map(brw, intel_obj->buffer, access);
> > + void *map = brw_bo_map(brw, intel_obj->buffer, access);
> > + if (!(access & GL_MAP_UNSYNCHRONIZED_BIT)) {
> > mark_buffer_inactive(intel_obj);
> > }
> >
> >
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the mesa-dev
mailing list