[Mesa-dev] [PATCH 40/53] i965/drm: Rewrite relocation handling.

Wed Apr 5 23:56:42 UTC 2017

On Wednesday, April 5, 2017 4:46:27 AM PDT Chris Wilson wrote:
> On Tue, Apr 04, 2017 at 05:10:30PM -0700, Kenneth Graunke wrote:
> > The execbuf2 kernel API requires us to construct two kinds of lists.
> > First is a "validation list" (struct drm_i915_gem_exec_object2[])
> > containing each BO referenced by the batch.  (The batch buffer itself
> > must be the last entry in this list.)  Each validation list entry
> > contains a pointer to the second kind of list: a relocation list.
> > The relocation list contains information about pointers to BOs that
> > the kernel may need to patch up if it relocates objects within the VMA.
> > 
> > This is a very general mechanism, allowing every BO to contain pointers
> > to other BOs.  libdrm_intel models this by giving each drm_intel_bo a
> > list of relocations to other BOs.  Together, these form "reloc trees".
> > 
> > Processing relocations involves a depth-first-search of the relocation
> > trees, starting from the batch buffer.  Care has to be taken not to
> > double-visit buffers.  Creating the validation list has to be deferred
> > until the last minute, after all relocations are emitted, so we have the
> > full tree present.  Calculating the amount of aperture space required to
> > pin those BOs also involves tree walking, which is expensive, so libdrm
> > has hacks to try and perform less expensive estimates.
> > 
> > For some reason, it also stored the validation list in the global
> > (per-screen) bufmgr structure, rather than as an local variable in the
> > execbuffer function, requiring locking for no good reason.
> > 
> > It also assumed that the batch would probably contain a relocation
> > every 2 DWords - which is absurdly high - and simply aborted if there
> > were more relocations than the max.  This meant the first relocation
> > from a BO would allocate 180kB of data structures!
> > 
> > This is way too complicated for our needs.  i965 only emits relocations
> > from the batchbuffer - all GPU commands and state such as SURFACE_STATE
> > live in the batch BO.  No other buffer uses relocations.  This means we
> > can have a single relocation list for the batchbuffer.  We can add a BO
> > to the validation list (set) the first time we emit a relocation to it.
> > We can easily keep a running tally of the aperture space required for
> > that list by adding the BO size when we add it to the validation list.
> > 
> > This patch overhauls the relocation system to do exactly that.  There
> > are many nice benefits:
> > 
> > - We have a flat relocation list instead of trees.
> > - We can produce the validation list up front.
> > - We can allocate smaller arrays and dynamically grow them.
> > - Aperture space checks are now (a + b <= c) instead of a tree walk.
> > - brw_batch_references() is a trivial validation list walk.
> >   It should be straightforward to make it O(1) in the future.
> > - We don't need to bloat each drm_bacon_bo with 32B of reloc data.
> > - We don't need to lock in execbuffer, as the data structures are
> >   context-local, and not per-screen.
> > - Significantly less code and a better match for what we're doing.
> > - The simpler system should make it easier to take advantage of
> >   I915_EXEC_NO_RELOC in a future patch.
> > ---
> >  src/mesa/drivers/dri/i965/brw_bufmgr.h        |  53 +-
> >  src/mesa/drivers/dri/i965/brw_compute.c       |   2 +-
> >  src/mesa/drivers/dri/i965/brw_context.h       |  12 +
> >  src/mesa/drivers/dri/i965/brw_draw.c          |   2 +-
> >  src/mesa/drivers/dri/i965/genX_blorp_exec.c   |   2 +-
> >  src/mesa/drivers/dri/i965/intel_batchbuffer.c | 215 +++++++-
> >  src/mesa/drivers/dri/i965/intel_batchbuffer.h |   3 +
> >  src/mesa/drivers/dri/i965/intel_blit.c        |  30 +-
> >  src/mesa/drivers/dri/i965/intel_bufmgr_gem.c  | 718 +-------------------------
> >  9 files changed, 229 insertions(+), 808 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.h b/src/mesa/drivers/dri/i965/brw_bufmgr.h
> > index 237f39bb078..d3db6a3967b 100644
> > --- a/src/mesa/drivers/dri/i965/brw_bufmgr.h
> > +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.h
> > @@ -88,6 +88,15 @@ struct _drm_bacon_bo {
> >  	 * entries when calling drm_bacon_bo_emit_reloc()
> >  	 */
> >  	uint64_t offset64;
> > +
> > +	/**
> > +	 * Boolean of whether the GPU is definitely not accessing the buffer.
> > +	 *
> > +	 * This is only valid when reusable, since non-reusable
> > +	 * buffers are those that have been shared with other
> > +	 * processes, so we don't know their state.
> > +	 */
> > +	bool idle;
> >  };
> >  
> >  #define BO_ALLOC_FOR_RENDER (1<<0)
> > @@ -178,37 +187,6 @@ void drm_bacon_bo_wait_rendering(drm_bacon_bo *bo);
> >   */
> >  void drm_bacon_bufmgr_destroy(drm_bacon_bufmgr *bufmgr);
> >  
> > -/** Executes the command buffer pointed to by bo. */
> > -int drm_bacon_bo_exec(drm_bacon_bo *bo, int used);
> > -
> > -/** Executes the command buffer pointed to by bo on the selected ring buffer */
> > -int drm_bacon_bo_mrb_exec(drm_bacon_bo *bo, int used, unsigned int flags);
> > -int drm_bacon_bufmgr_check_aperture_space(drm_bacon_bo ** bo_array, int count);
> > -
> > -/**
> > - * Add relocation entry in reloc_buf, which will be updated with the
> > - * target buffer's real offset on on command submission.
> > - *
> > - * Relocations remain in place for the lifetime of the buffer object.
> > - *
> > - * \param bo Buffer to write the relocation into.
> > - * \param offset Byte offset within reloc_bo of the pointer to
> > - *                      target_bo.
> > - * \param target_bo Buffer whose offset should be written into the
> > - *                  relocation entry.
> > - * \param target_offset Constant value to be added to target_bo's
> > - *                      offset in relocation entry.
> > - * \param read_domains GEM read domains which the buffer will be
> > - *                      read into by the command that this relocation
> > - *                      is part of.
> > - * \param write_domains GEM read domains which the buffer will be
> > - *                      dirtied in by the command that this
> > - *                      relocation is part of.
> > - */
> > -int drm_bacon_bo_emit_reloc(drm_bacon_bo *bo, uint32_t offset,
> > -			    drm_bacon_bo *target_bo, uint32_t target_offset,
> > -			    uint32_t read_domains, uint32_t write_domain);
> > -
> >  /**
> >   * Ask that the buffer be placed in tiling mode
> >   *
> > @@ -271,9 +249,6 @@ int drm_bacon_bo_disable_reuse(drm_bacon_bo *bo);
> >   */
> >  int drm_bacon_bo_is_reusable(drm_bacon_bo *bo);
> >  
> > -/** Returns true if target_bo is in the relocation tree rooted at bo. */
> > -int drm_bacon_bo_references(drm_bacon_bo *bo, drm_bacon_bo *target_bo);
> > -
> >  /* drm_bacon_bufmgr_gem.c */
> >  drm_bacon_bufmgr *drm_bacon_bufmgr_gem_init(struct gen_device_info *devinfo,
> >  					    int fd, int batch_size);
> > @@ -290,8 +265,6 @@ void *drm_bacon_gem_bo_map__cpu(drm_bacon_bo *bo);
> >  void *drm_bacon_gem_bo_map__gtt(drm_bacon_bo *bo);
> >  void *drm_bacon_gem_bo_map__wc(drm_bacon_bo *bo);
> >  
> > -int drm_bacon_gem_bo_get_reloc_count(drm_bacon_bo *bo);
> > -void drm_bacon_gem_bo_clear_relocs(drm_bacon_bo *bo, int start);
> >  void drm_bacon_gem_bo_start_gtt_access(drm_bacon_bo *bo, int write_enable);
> >  
> >  int drm_bacon_gem_bo_wait(drm_bacon_bo *bo, int64_t timeout_ns);
> > @@ -300,14 +273,6 @@ drm_bacon_context *drm_bacon_gem_context_create(drm_bacon_bufmgr *bufmgr);
> >  int drm_bacon_gem_context_get_id(drm_bacon_context *ctx,
> >                                   uint32_t *ctx_id);
> >  void drm_bacon_gem_context_destroy(drm_bacon_context *ctx);
> > -int drm_bacon_gem_bo_context_exec(drm_bacon_bo *bo, drm_bacon_context *ctx,
> > -				  int used, unsigned int flags);
> > -int drm_bacon_gem_bo_fence_exec(drm_bacon_bo *bo,
> > -				drm_bacon_context *ctx,
> > -				int used,
> > -				int in_fence,
> > -				int *out_fence,
> > -				unsigned int flags);
> >  
> >  int drm_bacon_bo_gem_export_to_prime(drm_bacon_bo *bo, int *prime_fd);
> >  drm_bacon_bo *drm_bacon_bo_gem_create_from_prime(drm_bacon_bufmgr *bufmgr,
> > diff --git a/src/mesa/drivers/dri/i965/brw_compute.c b/src/mesa/drivers/dri/i965/brw_compute.c
> > index d816d056aad..e924401c3af 100644
> > --- a/src/mesa/drivers/dri/i965/brw_compute.c
> > +++ b/src/mesa/drivers/dri/i965/brw_compute.c
> > @@ -212,7 +212,7 @@ brw_dispatch_compute_common(struct gl_context *ctx)
> >  
> >     brw->no_batch_wrap = false;
> >  
> > -   if (drm_bacon_bufmgr_check_aperture_space(&brw->batch.bo, 1)) {
> > +   if (!brw_batch_has_aperture_space(brw, 0)) {
> >        if (!fail_next) {
> >           intel_batchbuffer_reset_to_saved(brw);
> >           intel_batchbuffer_flush(brw);
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
> > index 00e9224d7d7..186ce826801 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > @@ -477,9 +477,21 @@ struct intel_batchbuffer {
> >     bool needs_sol_reset;
> >     bool state_base_address_emitted;
> >  
> > +   struct drm_i915_gem_relocation_entry *relocs;
> > +   int reloc_count;
> > +   int reloc_array_size;
> > +   /** The validation list */
> > +   struct drm_i915_gem_exec_object2 *exec_objects;
> > +   drm_bacon_bo **exec_bos;
> > +   int exec_count;
> > +   int exec_array_size;
> > +   /** The amount of aperture space (in bytes) used by all exec_bos */
> > +   int aperture_space;
> 
> You are not using the aperture (that is the CPU addressable BAR). This
> is just the batch_size, and not to be confused with the rss that you may
> also want to track in future.
> 
> > +
> >     struct {
> >        uint32_t *map_next;
> >        int reloc_count;
> > +      int exec_count;
> >     } saved;
> >  
> >     /** Map from batch offset to brw_state_batch data (with DEBUG_BATCH) */
> > diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c
> > index a6e229f2210..bf09915d0c3 100644
> > --- a/src/mesa/drivers/dri/i965/brw_draw.c
> > +++ b/src/mesa/drivers/dri/i965/brw_draw.c
> > @@ -601,7 +601,7 @@ retry:
> >  
> >        brw->no_batch_wrap = false;
> >  
> > -      if (drm_bacon_bufmgr_check_aperture_space(&brw->batch.bo, 1)) {
> > +      if (!brw_batch_has_aperture_space(brw, 0)) {
> >           if (!fail_next) {
> >              intel_batchbuffer_reset_to_saved(brw);
> >              intel_batchbuffer_flush(brw);
> > diff --git a/src/mesa/drivers/dri/i965/genX_blorp_exec.c b/src/mesa/drivers/dri/i965/genX_blorp_exec.c
> > index dc1c1ea2993..e1a799457a2 100644
> > --- a/src/mesa/drivers/dri/i965/genX_blorp_exec.c
> > +++ b/src/mesa/drivers/dri/i965/genX_blorp_exec.c
> > @@ -231,7 +231,7 @@ retry:
> >      * map all the BOs into the GPU at batch exec time later.  If so, flush the
> >      * batch and try again with nothing else in the batch.
> >      */
> > -   if (drm_bacon_bufmgr_check_aperture_space(&brw->batch.bo, 1)) {
> > +   if (!brw_batch_has_aperture_space(brw, 0)) {
> >        if (!check_aperture_failed_once) {
> >           check_aperture_failed_once = true;
> >           intel_batchbuffer_reset_to_saved(brw);
> > diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> > index 59ab55d31f7..444e33c48ee 100644
> > --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> > +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> > @@ -38,6 +38,8 @@
> >  #include <xf86drm.h>
> >  #include <i915_drm.h>
> >  
> > +#define FILE_DEBUG_FLAG DEBUG_BUFMGR
> > +
> >  static void
> >  intel_batchbuffer_reset(struct intel_batchbuffer *batch,
> >                          drm_bacon_bufmgr *bufmgr,
> > @@ -68,6 +70,17 @@ intel_batchbuffer_init(struct intel_batchbuffer *batch,
> >        batch->map_next = batch->cpu_map;
> >     }
> >  
> > +   batch->reloc_count = 0;
> > +   batch->reloc_array_size = 768;
> 
> Wowser!

I ran a couple of programs when trying to decide how big to make these
arrays.  The Unreal Elemental demo had 749 relocations in a draw, and
200 BOs used by a batch.  Other simpler games had 264 relocations and
55 BOs.

These entries are 32 bytes, so 768 is 24kB.  It might make sense to pick
a smaller size and let it adjust up once or twice.  It shouldn't be that
costly to realloc as long as we aren't doing it all the time...

> > +   batch->relocs = malloc(batch->reloc_array_size *
> > +                          sizeof(struct drm_i915_gem_relocation_entry));
> 
> Allocation failure handling? Deferred to use? Even so, you will need to
> record you didn't allocate the array_size.

Frankly, if you run out of VMA such that malloc doesn't work, things are
going to catch fire and there's not much we can do about it.  Obviously
you need to be careful in the kernel, but we tend to play fast and loose
with this in userspace.

It looks like the old libdrm_intel code checked for NULL pointers and
failed the execbuffer call with -ENOMEM.  Of course, Mesa would then
happily call abort().

libdrm_intel would also assert fail if you used more than 4094 relocs.

So I'm not sure this is actually worse.

> >  void
> >  intel_batchbuffer_reset_to_saved(struct brw_context *brw)
> >  {
> > -   drm_bacon_gem_bo_clear_relocs(brw->batch.bo, brw->batch.saved.reloc_count);
> > +   for (int i = brw->batch.saved.exec_count;
> > +        i < brw->batch.exec_count; i++) {
> > +      if (brw->batch.exec_bos[i] != brw->batch.bo) {
> > +         drm_bacon_bo_unreference(brw->batch.exec_bos[i]);
> > +      }
> 
> There's a trick to add to your todo list - treat being busy as an
> implicit reference.

Neat, that would definitely save a lot of refcounting churn and list
walking.  I might take a look at that after we improve the busy
tracking.

> 
> > +static void
> > +add_exec_bo(struct intel_batchbuffer *batch, drm_bacon_bo *bo)
> > +{
> > +   if (bo != batch->bo) {
> > +      for (int i = 0; i < batch->exec_count; i++) {
> > +         if (batch->exec_bos[i] == bo)
> > +            return;
> > +      }
> > +
> > +      drm_bacon_bo_reference(bo);
> > +   }
> > +
> > +   if (batch->exec_count == batch->exec_array_size) {
> > +      batch->exec_array_size *= 2;
> > +      batch->exec_bos =
> > +         realloc(batch->exec_bos,
> > +                 batch->exec_array_size * sizeof(batch->exec_bos[0]));
> > +      batch->exec_objects =
> > +         realloc(batch->exec_objects,
> > +                 batch->exec_array_size * sizeof(batch->exec_objects[0]));
> 
> This has to be robust!

It wasn't robust in the old code.  It would realloc and immediately try
and write to it with no NULL check.  So, not worse than the status quo.

> What I suggested was to install a setjmp during
> batch_begin so that you could unwind and escape after an allocation
> failure and have it reported back to GL. Those patches to refactor the
> aperture checking / error checking around batch buffer handling can be
> applied to this series.

That might be a reasonable solution to the problem.  It's basically the
same problem as the estimated_max_batch_space issue, which we definitely
need to solve.

> > +   }
> > +
> > +   struct drm_i915_gem_exec_object2 *validation_entry =
> > +      &batch->exec_objects[batch->exec_count];
> > +   validation_entry->handle = bo->handle;
> > +   if (bo == batch->bo) {
> > +      validation_entry->relocation_count = batch->reloc_count;
> > +      validation_entry->relocs_ptr = (uintptr_t) batch->relocs;
> > +   } else {
> > +      validation_entry->relocation_count = 0;
> > +      validation_entry->relocs_ptr = 0;
> > +   }
> > +   validation_entry->alignment = bo->align;
> > +   validation_entry->offset = bo->offset64;
> > +   validation_entry->flags = 0;
> > +   validation_entry->rsvd1 = 0;
> > +   validation_entry->rsvd2 = 0;
> > +
> > +   batch->exec_bos[batch->exec_count] = bo;
> > +   batch->exec_count++;
> > +   batch->aperture_space += bo->size;
> > +}
> > +
> > +static int
> > +execbuffer(int fd,
> > +           struct intel_batchbuffer *batch,
> > +           drm_bacon_context *ctx,
> > +           int used,
> > +           int in_fence,
> > +           int *out_fence,
> > +           int flags)
> > +{
> > +   struct drm_i915_gem_execbuffer2 execbuf = {
> > +      .buffers_ptr = (uintptr_t) batch->exec_objects,
> > +      .buffer_count = batch->exec_count,
> > +      .batch_start_offset = 0,
> > +      .batch_len = used,
> > +      .cliprects_ptr = 0,
> > +      .num_cliprects = 0,
> > +      .DR1 = 0,
> > +      .DR4 = 0,
> 
> Just don't mention the garbage.

Good call.

> > +      .flags = flags,
> > +      .rsvd2 = 0,
> > +   };
> > +
> > +   uint32_t ctx_id = 0;
> > +   drm_bacon_gem_context_get_id(ctx, &ctx_id);
> > +   i915_execbuffer2_set_context_id(execbuf, ctx_id);
> 
> Urgh.

I hadn't looked at what that did - urgh indeed.  Should I just do

      .rsvd1 = ctx_id; /* rsvd1 is actually the context ID */

instead?

> > +
> > +   if (in_fence != -1) {
> > +      execbuf.rsvd2 = in_fence;
> > +      execbuf.flags |= I915_EXEC_FENCE_IN;
> > +   }
> > +
> > +   if (out_fence != NULL) {
> > +      *out_fence = -1;
> > +      execbuf.flags |= I915_EXEC_FENCE_OUT;
> > +   }
> > +
> > +   int ret = drmIoctl(fd, DRM_IOCTL_I915_GEM_EXECBUFFER2_WR, &execbuf);
> 
> You could improve this by
> 
> cmd = DRM_IOCTL_I915_GEM_EXECBUFFER2;
> if (out_fence)
> 	cmd |= _IOC_WRITE << _IOC_DIRSHIFT;
> or
> 	cmd = DRM_IOCTL_I915_GEM_EXECBUFFER2_WR;

Okay, I'll use DRM_IOCTL_I915_GEM_EXECBUFFER2_WR if out_fence != NULL
and DRM_IOCTL_I915_GEM_EXECBUFFER2 otherwise.

> > +   if (ret != 0)
> > +      ret = -errno;
> > +
> > +   for (int i = 0; i < batch->exec_count; i++) {
> > +      drm_bacon_bo *bo = batch->exec_bos[i];
> > +
> > +      bo->idle = false;
> > +
> > +      /* Update drm_bacon_bo::offset64 */
> > +      if (batch->exec_objects[i].offset != bo->offset64) {
> > +         DBG("BO %d migrated: 0x%" PRIx64 " -> 0x%llx\n",
> > +             bo->handle, bo->offset64, batch->exec_objects[i].offset);
> > +         bo->offset64 = batch->exec_objects[i].offset;
> > +      }
> > +   }
> > +
> > +   if (ret == 0 && out_fence != NULL)
> > +      *out_fence = execbuf.rsvd2 >> 32;
> > +
> > +   return ret;
> > +}
> > +
> >  static int
> >  do_flush_locked(struct brw_context *brw, int in_fence_fd, int *out_fence_fd)
> >  {
> > +   __DRIscreen *dri_screen = brw->screen->driScrnPriv;
> >     struct intel_batchbuffer *batch = &brw->batch;
> >     int ret = 0;
> >  
> > @@ -484,17 +625,18 @@ do_flush_locked(struct brw_context *brw, int in_fence_fd, int *out_fence_fd)
> >  	 flags |= I915_EXEC_GEN7_SOL_RESET;
> >  
> >        if (ret == 0) {
> > -	 if (brw->hw_ctx == NULL || batch->ring != RENDER_RING) {
> > +         void *hw_ctx = batch->ring != RENDER_RING ? NULL : brw->hw_ctx;
> > +	 if (hw_ctx == NULL) {
> >              assert(in_fence_fd == -1);
> >              assert(out_fence_fd == NULL);
> 
> These assertions are no longer related. They only applied to the
> libdrm_intel API.

Okay, I'll drop them.

> > -            ret = drm_bacon_bo_mrb_exec(batch->bo, 4 * USED_BATCH(*batch),
> > -                                        flags);
> > -	 } else {
> > -	    ret = drm_bacon_gem_bo_fence_exec(batch->bo, brw->hw_ctx,
> > -                                                4 * USED_BATCH(*batch),
> > -                                                in_fence_fd, out_fence_fd,
> > -                                                flags);
> > -	 }
> > +         }
> > +
> > +         /* Add the batch itself to the end of the validation list */
> > +         add_exec_bo(batch, batch->bo);
> > +
> > +         ret = execbuffer(dri_screen->fd, batch, hw_ctx,
> > +                          4 * USED_BATCH(*batch),
> > +                          in_fence_fd, out_fence_fd, flags);
> >        }
> >  
> >        throttle(brw);
> > @@ -577,9 +719,20 @@ _intel_batchbuffer_flush_fence(struct brw_context *brw,
> >  }
> >  
> >  bool
> > +brw_batch_has_aperture_space(struct brw_context *brw, unsigned extra_space)
> > +{
> > +   return brw->batch.aperture_space + extra_space <=
> > +          brw->screen->aperture_threshold;
> > +}
> > +
> > +bool
> >  brw_batch_references(struct intel_batchbuffer *batch, drm_bacon_bo *bo)
> >  {
> > -   return drm_bacon_bo_references(batch->bo, bo);
> > +   for (int i = 0; i < batch->exec_count; i++) {
> > +      if (batch->exec_bos[i] == bo)
> > +         return true;
> > +   }
> > +   return false;
> >  }
> >  
> >  /*  This is the only way buffers get added to the validate list.
> > @@ -589,13 +742,31 @@ brw_emit_reloc(struct intel_batchbuffer *batch, uint32_t batch_offset,
> >                 drm_bacon_bo *target, uint32_t target_offset,
> >                 uint32_t read_domains, uint32_t write_domain)
> >  {
> > -   int ret;
> > +   if (batch->reloc_count == batch->reloc_array_size) {
> > +      batch->reloc_array_size *= 2;
> > +      batch->relocs = realloc(batch->relocs,
> > +                              batch->reloc_array_size *
> > +                              sizeof(struct drm_i915_gem_relocation_entry));
> 
> See above for ideas on handling malloc failures.
> 
> > +   }
> > +
> > +   /* Check args */
> > +   assert(batch_offset <= BATCH_SZ - sizeof(uint32_t));
> > +   assert(_mesa_bitcount(write_domain) <= 1);
> > +
> > +   if (target != batch->bo)
> > +      add_exec_bo(batch, target);
> > +
> > +   struct drm_i915_gem_relocation_entry *reloc =
> > +      &batch->relocs[batch->reloc_count];
> > +
> > +   batch->reloc_count++;
> >  
> > -   ret = drm_bacon_bo_emit_reloc(batch->bo, batch_offset,
> > -                                 target, target_offset,
> > -                                 read_domains, write_domain);
> > -   assert(ret == 0);
> > -   (void)ret;
> > +   reloc->offset = batch_offset;
> > +   reloc->delta = target_offset;
> > +   reloc->target_handle = target->handle;
> > +   reloc->read_domains = read_domains;
> > +   reloc->write_domain = write_domain;
> > +   reloc->presumed_offset = target->offset64;
> 
> Hmm, this is the wrong offset (its global not context). This is not the
> final version of relocation handling...
> -Chris

Correct.  This is still using the bogus global offset64.

That's one of the first things I plan to work on after this series.
This patch is mainly just moving to a flat list.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170405/f149d466/attachment.sig>