[Mesa-dev] [RFC 3/3] anv: Do relocations in userspace before execbuf ioctl

Jason Ekstrand jason at jlekstrand.net
Tue Nov 1 15:39:16 UTC 2016


On Tue, Nov 1, 2016 at 1:07 AM, Chris Wilson <chris at chris-wilson.co.uk>
wrote:

> On Mon, Oct 31, 2016 at 09:48:15PM -0700, Jason Ekstrand wrote:
> > From: Kristian Høgsberg Kristensen <kristian.h.kristensen at intel.com>
> >
> > This reduces the amount of stalling that the kernel does between batches
> > and improves the performance of Dota 2 on a Sky Lake GT2 desktop by
> around
> > 30%.  Ideally, we would have a simple execbuf2 flag that would let us
> tell
> > the kernel "Trust me, you don't need to stall to relocate" but the kernel
> > has no such flag at the moment.
>
> The NO_RELOC approach should work out to be faster since you then avoid
> all the relocation handling in the kernel. You already have the buffers
> coherently mapped, the kernel doesn't, the kernel needs to copy the
> entire relocation tree if even one relocation is wrong, etc.
>
> > Signed-off-by: Jason Ekstrand <jason at jlekstrand.net>
> > Cc: Kristian Høgsberg <krh at bitplanet.net>
> > Cc: Ben Widawsky <ben at bwidawsk.net>
> > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> >
> > ---
> >  src/intel/vulkan/anv_device.c | 96 ++++++++++++++++++++++++++++++
> +++++++++++--
> >  1 file changed, 93 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/intel/vulkan/anv_device.c
> b/src/intel/vulkan/anv_device.c
> > index baa767e..ef371a7 100644
> > --- a/src/intel/vulkan/anv_device.c
> > +++ b/src/intel/vulkan/anv_device.c
> > @@ -1068,6 +1068,89 @@ void anv_GetDeviceQueue(
> >     *pQueue = anv_queue_to_handle(&device->queue);
> >  }
> >
> > +static void
> > +write_reloc(const struct anv_device *device, void *p, uint64_t v)
> > +{
> > +   if (device->info.gen >= 8)
> > +      *(uint64_t *)p = v;
>
> Something to remember here is that some MI (I think) instructions
> require canonical format. (bits 63:48 = bit 47).
>

Is there anything you can point me to on this?  Looking at the i915
sources, it appears that it just writes in the relocation verbatim.


> > +   else
> > +      *(uint32_t *)p = v;
> > +}
> > +
> > +static void
> > +anv_reloc_list_apply(struct anv_reloc_list *list,
> > +                     struct anv_device *device, struct anv_bo *bo)
> > +{
> > +   for (size_t i = 0; i < list->num_relocs; i++) {
> > +      void *p = bo->map + list->relocs[i].offset;
> > +
> > +      struct anv_bo *target_bo = list->reloc_bos[i];
> > +      write_reloc(device, p, target_bo->offset + list->relocs[i].delta);
> > +      list->relocs[i].presumed_offset = bo->offset;
> > +   }
> > +}
> > +
> > +/**
> > + * This function applies the relocation for a command buffer and writes
> the
> > + * actual addresses into the buffers as per what we were told by the
> kernel on
> > + * the previous execbuf2 call.  This should be safe to do because, for
> each
> > + * relocated address, we have two cases:
> > + *
> > + *  1) The buffer to which the address points is not in use by the
> GPU.  In
> > + *     this case, neither is the 32 or 64-bit chunk of GPU memory
> storing the
> > + *     address so it's completely safe to update the address.
>
> A caveat here is that if the kernel believes the buffer is not in use,
> assume the kernel moved it. Hence you always need to tell the kernel
> about the buffers for the batch and your assumed locations. This is very
> important to remember when dealing with workloads that cause GTT
> thrashing or swap thrashing. (In the past you would have also caused
> corruption here if you forgot to set the write markers.)
>

Yes, I believe I am doing this properly.  Above, after doing the userspace
reloc, I set the presumed offset based on the reloc that was just applied.
If the kernel shuffles things around due to GTT thrashing, it has the
information it needs to do the relocations correctly in that case.


> > + *
> > + *  2) The buffer to which the address points is currently in use by
> the GPU.
> > + *     If it's still in use, then some execbuf2 call before this one
> used it
> > + *     and it has been in use ever since.  In this case, it can't have
> moved
> > + *     since the last execbuffer2 call.  We then have two subcases:
> > + *
> > + *      a) The 32 or 64-bit chunk of GPU memory storing the address is
> not in
> > + *         use by the GPU.  In this case, it is safe to write the
> relocation.
> > + *
> > + *      a) The 32 or 64-bit chunk of GPU memory storing the address is
> > + *         currently in use by the GPU.  In this case the the relocated
> > + *         address we will write is exactly the same as the last
> address we
> > + *         wrote there.
>
> (But I don't see you checking on the kernel to confirm, e.g.
> https://cgit.freedesktop.org/~ickle/mesa/tree/src/mesa/
> drivers/dri/i965/brw_batch.c?h=brw-batch#n814
> )
>

The point here is that the BO containing the image data is in one of two
states (as seen by the kernel): active or inactive.  If it's active then so
is the surface state buffer so nothing is moving around and any address
that I write into the surface state buffer will be exactly what is already
there.  If it's inactive then I know the GPU isn't using it so changing
that value won't cause the GPU any problems.  If the kernel comes by and
has to change it again due to GTT thrashing, that's just fine but the point
is that changing it now is ok.


> > + *
> > + * By doing relocations on the CPU, we can tell the kernel that it
> doesn't
> > + * need to bother.  We want to do this because the surface state buffer
> is
> > + * used by every command buffer so, if the kernel does the relocations,
> it
> > + * will always be busy and the kernel will always stall.
>
> Userspace relocation + NO_RELOC processing should also work out to be
> fastest of the available approaches today. If you want to try an ignore
> stalls flag, it shouldn't take too long to write (I'll get that done
> today) and then you can compare the overhead of the kernel having to copy
> the relocations from userspace. (You also want the fast execbuf relocation
> processing patches from a few years ago...)
>

Good point!  I'll add that to the comment.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20161101/82fd2f7a/attachment-0001.html>


More information about the mesa-dev mailing list