[Mesa-dev] [PATCH] i965: Allow 48-bit addressing on Gen8+.

Chris Wilson chris at chris-wilson.co.uk
Tue Feb 27 21:12:08 UTC 2018


Quoting Kenneth Graunke (2018-02-27 20:56:29)
> On Tuesday, February 27, 2018 12:35:32 AM PST Chris Wilson wrote:
> > Quoting Kenneth Graunke (2018-02-27 00:05:46)
> > > +static bool
> > > +gem_supports_48b_addresses(int fd)
> > > +{
> > > +   struct drm_i915_gem_exec_object2 obj = {
> > > +      .flags = EXEC_OBJECT_SUPPORTS_48B_ADDRESS,
> > > +   };
> > > +
> > > +   struct drm_i915_gem_execbuffer2 execbuf = {
> > > +      .buffers_ptr = (uintptr_t)&obj,
> > > +      .buffer_count = 1,
> > > +      .rsvd1 = 0xffffffu,
> > > +   };
> > > +
> > > +   int ret = drmIoctl(fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, &execbuf);
> > > +
> > > +   return ret == -1 && errno == ENOENT;
> > 
> > Note that this reports ENOENT due to the unknown ctx id which is
> > evaluated before the object now. But since you are only looking for the
> > flag, that's ok as the behaviour changed after the flag was introduced.
> 
> Alright.  I just stole this from anv, so I figured it was correct.

It does the job. I was worrying that it works despite not doing the
intended test. :)

One thing that is unambiguous is that only kernels with 48b ppgtt
support 48B address, so you could just query the context's gtt size and
only set the flag when dealing with a large ppgtt.

> > >     if (devinfo->gen >= 8) {
> > > +      /* STATE_BASE_ADDRESS has issues with 48-bit address spaces.  If the
> > > +       * address + size as seen by STATE_BASE_ADDRESS overflows 48 bits,
> > > +       * the GPU appears to treat all accesses to the buffer as being out
> > > +       * of bounds and returns zero.  To work around this, we pin all SBAs
> > > +       * to the bottom 4GB.
> > > +       */
> > 
> > We could do with a quick explanation as to why we can't program the max
> > size. Aiui, it's because of dynamic resize and so you cannot program
> > here the size of object as reported to the kernel.
> 
> I don't think it has anything to do with dynamic resize.  I think Jason
> told me that it seems to compute an "upper bound address" by doing base
> address + size and if that's at the very end, it can wrap from 2**48 to
> 0, causing the upper bound to be less than the base, at which point it
> assumes zero size and all data reads back as 0.
> 
> If that's the actual problem, we don't need to restrict it to 4GB, we
> just need to restrict it away from the very end...

Ok, I have a patch around to exclude the last page in ppgtt which would
accomplish this. (The pain being having to check kernel versions, at
which point you will probably have finished softpinning before the
kernel is deployed.)

> > > diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c > index 26718e0d1a2..0a8d3a80b64 100644
> > > --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> > > +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> > > @@ -1093,6 +1093,21 @@ emit_reloc(struct intel_batchbuffer *batch,
> > >     unsigned int index = add_exec_bo(batch, target);
> > >     struct drm_i915_gem_exec_object2 *entry = &batch->validation_list[index];
> > >  
> > > +   if (reloc_flags & RELOC_32BIT) {
> > > +      /* Restrict this buffer to the low 32 bits of the address space.
> > > +       *
> > > +       * Altering the validation list flags restricts it for this batch,
> > > +       * but we also alter the BO's kflags to restrict it permanently
> > > +       * (until the BO is destroyed and put back in the cache).  Buffers
> > > +       * may stay bound across batches, and we want keep it constrained.
> > > +       */
> > 
> > Hmm, I think the intent is more along the lines of the buffer may be
> > reused for non-r32b targets between batches, but we want to avoid
> > ping-pong migrations into r32b (as moving an active buffer will stall,
> > hmm, can fix if needs must), so once we go r32b we never go back.
> 
> Yeah, that's a better explanation.  How about:
> 
>       /* Restrict this buffer to the low 32 bits of the address space,
>        * both now, and forever.  Altering the validation list's flags
>        * restricts it for this batch, but we also modify the BO's kflags
>        * to restrict it permanently.  This avoids ping-ponging the BO
>        * between the low 32 bits and the full address space.  Moving an
>        * active buffer can cause stalls.
>        */

Ok. Trying to explain the migration would only be on active buffers for
48b->32b transitions is too painful. The gist is as in the comment, we
expect to use it as r32b in future, so if we allow the kernel to move it
(for swap out/in), we will expect to stall to fit it in 32b.
 
> > > +      target->kflags &= ~EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
> > > +      entry->flags &= ~EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
> > > +
> > > +      /* RELOC_32BIT is not an EXEC_OBJECT_* flag, so get rid of it. */
> > > +      reloc_flags &= ~RELOC_32BIT;
> > > +   }
> > > +
> > >     if (reloc_flags)
> > >        entry->flags |= reloc_flags & batch->valid_reloc_flags;
> > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > > index fb180289a0c..2e54adb3ed2 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > > @@ -119,6 +119,7 @@ struct brw_bufmgr {
> > >     bool has_llc:1;
> > >     bool has_mmap_wc:1;
> > >     bool bo_reuse:1;
> > > +   bool supports_48b_addresses:1;
> > >  };
> > >  
> > >  static int bo_set_tiling_internal(struct brw_bo *bo, uint32_t tiling_mode,
> > > @@ -409,6 +410,8 @@ retry:
> > >     bo->reusable = true;
> > >     bo->cache_coherent = bufmgr->has_llc;
> > >     bo->index = -1;
> > > +   if (bufmgr->supports_48b_addresses)
> > > +      bo->kflags = EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
> > 
> > bo->kflags = bufmgr->supports_48b_addresses << 3 ?
> > (hidden behind a bit of syntactic sugar)
> 
> I don't see how that's clearer than using the actual EXEC_OBJECT_*
> define from the kernel uABI, or what the point of the new syntatic
> sugar would be...this is already only two lines of obvious code...

Clearer when reading asm? ;)

Whilst here, could you move bo->kflags = 0 from bo_free to here?
Everytime we touch bo->kflags I have to track down where it is reset.
-Chris


More information about the mesa-dev mailing list