[Mesa-dev] [PATCH 6/6] anv: Do relocations in userspace before execbuf ioctl
Jason Ekstrand
jason at jlekstrand.net
Wed Nov 2 18:42:41 UTC 2016
On Wed, Nov 2, 2016 at 9:43 AM, Chris Wilson <chris at chris-wilson.co.uk>
wrote:
> On Tue, Nov 01, 2016 at 03:35:13PM -0700, Jason Ekstrand wrote:
> > +static void
> > +write_reloc(const struct anv_device *device, void *p, uint64_t v)
> > +{
> > + unsigned reloc_size = 0;
> > + if (device->info.gen >= 8) {
> > + /* From the Broadwell PRM Vol. 2a, MI_LOAD_REGISTER_MEM::MemoryAd
> dress:
> > + *
> > + * "This field specifies the address of the memory location
> where the
> > + * register value specified in the DWord above will read from.
> The
> > + * address specifies the DWord location of the data. Range =
> > + * GraphicsVirtualAddress[63:2] for a DWord register
> GraphicsAddress
> > + * [63:48] are ignored by the HW and assumed to be in correct
> > + * canonical form [63:48] == [47]."
> > + */
>
> const int shift = 63 - 47; /* != 8 :) */
>
Drp... I'll use the const int. :-)
> > + reloc_size = sizeof(uint64_t);
> > + *(uint64_t *)p = (((int64_t)v) << 8) >> 8;
> > + } else {
> > + reloc_size = sizeof(uint32_t);
> > + *(uint32_t *)p = v;
> > + }
> > +
> > + if (!device->info.has_llc)
> > + anv_clflush_range(p, reloc_size);
> > +}
> > +
> > +static void
> > +anv_reloc_list_apply(struct anv_reloc_list *list,
> > + struct anv_device *device, struct anv_bo *bo)
> > +{
> > + for (size_t i = 0; i < list->num_relocs; i++) {
> > + void *p = bo->map + list->relocs[i].offset;
> > +
> > + struct anv_bo *target_bo = list->reloc_bos[i];
> > + write_reloc(device, p, target_bo->offset + list->relocs[i].delta);
> > + list->relocs[i].presumed_offset = bo->offset;
> > + }
> > +}
> > +
> > +/**
> > + * This function applies the relocation for a command buffer and writes
> the
> > + * actual addresses into the buffers as per what we were told by the
> kernel on
> > + * the previous execbuf2 call. This should be safe to do because, for
> each
> > + * relocated address, we have two cases:
> > + *
> > + * 1) The target BO is inactive (as seen by the kernel). In this
> case, it is
> > + * not in use by the GPU so updating the address is 100% ok. It
> won't be
> > + * in-use by the GPU (from our context) again until the next
> execbuf2
> > + * happens. If the kernel decides to move it in the next execbuf2,
> it
> > + * will have to do the relocations itself, but that's ok because it
> should
> > + * have all of the information needed to do so.
> > + *
> > + * 2) The target BO is active (as seen by the kernel). In this case,
> it
> > + * hasn't moved since the last execbuffer2 call because GTT
> shuffling
> > + * *only* happens inside the execbuffer2 ioctl. Since the target BO
> > + * hasn't moved, our anv_bo::offset exactly matches the BO's GTT
> address
> > + * and the relocated value we are writing into the BO will be the
> same as
> > + * the value that is already there.
> > + *
> > + * There is also a possibility that the target BO is active but the
> exact
> > + * RENDER_SURFACE_STATE object we are writing the relocation into
> isn't in
> > + * use. In this case, the address currently in the
> RENDER_SURFACE_STATE
> > + * may be stale but it's still safe to write the relocation because
> that
> > + * particular RENDER_SURFACE_STATE object isn't in-use by the GPU
> and
> > + * won't be until the next execbuf2 call.
> > + *
> > + * By doing relocations on the CPU, we can tell the kernel that it
> doesn't
> > + * need to bother. We want to do this because the surface state buffer
> is
> > + * used by every command buffer so, if the kernel does the relocations,
> it
> > + * will always be busy and the kernel will always stall. This is also
> > + * probably the fastest mechanism for doing relocations since the
> kernel would
> > + * have to make a full copy of all the relocations lists.
> > + */
> > +static void
> > +relocate_cmd_buffer(struct anv_cmd_buffer *cmd_buffer)
> > +{
> > + for (uint32_t i = 0; i < cmd_buffer->execbuf2.bo_count; i++) {
> > + if (cmd_buffer->execbuf2.bos[i]->offset == (uint64_t)-1)
> > + return;
> > + }
>
> Note you still have to make sure the reloc.presumed_offset matches the
> value used to compute the contents of the buffer. Old relocations will
> retain their value from the previous pass, new allocations will have
> presumed_offset = 0 and hopefully the content will be delta. (Otherwise
> you run into trouble if the kernel places the buffer at 0, it will then
> skip the relocations pointing to it.)
>
So, I *thought* we were doing this correctly, but I think the current code
is a bit on the busted side. I'm going to be sending out a v3 that has
some general relocation fixes as well as more comments (along the line of
what you said above)
> That deserves a comment (if only to contrast the two paths).
>
> > + anv_reloc_list_apply(&cmd_buffer->surface_relocs,
> > + cmd_buffer->device,
> > + &cmd_buffer->device->surface_s
> tate_block_pool.bo);
> > +
> > + struct anv_batch_bo **bbo;
> > + u_vector_foreach(bbo, &cmd_buffer->seen_bbos) {
> > + anv_reloc_list_apply(&(*bbo)->relocs,
> > + cmd_buffer->device, &(*bbo)->bo);
> > + }
> > +
> > + for (uint32_t i = 0; i < cmd_buffer->execbuf2.bo_count; i++) {
> > + struct anv_bo *bo = cmd_buffer->execbuf2.bos[i];
> > +
> > + cmd_buffer->execbuf2.objects[i].offset = bo->offset;
>
> /* bo->offset = last execobj.offset */
>
> *reloc.offset = bo->offset + delta;
> reloc.presumed_offset = bo->offset;
> execobj.offset = bo->offset;
>
> Looks good.
> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20161102/9b2b4d03/attachment-0001.html>
More information about the mesa-dev
mailing list