[Freedreno] MSM-DRM: Help in understanding the rol of relocs in command submission.

Rob Clark robdclark at gmail.com
Thu Feb 27 11:29:18 PST 2014


On Thu, Feb 27, 2014 at 1:14 PM, Aravind Ganesan
<aravindg at codeaurora.org> wrote:
> Hi Guys,
>
>                 I'm trying to understand why we need relocs while submitting
> commands and what the shift and offset represents. I couldn't find any
> explanation for this other than the comment in msm_drm.h  and some intel
> specific comments in http://lwn.net/Articles/283798/. Can anyone clarify
> this or point me to some better resources?

Might be useful to compare to the kgsl backend in libdrm, since that
is doing the equivalent thing with kgsl kernel interface, which you
may already familar with:

-------
static void kgsl_ringbuffer_emit_reloc(struct fd_ringbuffer *ring,
        const struct fd_reloc *r)
{
    struct kgsl_bo *kgsl_bo = to_kgsl_bo(r->bo);
    uint32_t addr = kgsl_bo_gpuaddr(kgsl_bo, r->offset);
    assert(addr);
    if (r->shift < 0)
        addr >>= -r->shift;
    else
        addr <<= r->shift;
    (*ring->cur++) = addr | r->or;
    kgsl_pipe_add_submit(to_kgsl_pipe(ring->pipe), kgsl_bo);
}
-------

Basically, for msm drm, that address calculation moves to the kernel.
Userspace puts what it *assumes* is the correct address, but that is
just a sort of optimization to avoid cmdstream patching in the kernel
in the common case, so you can ignore that.

So, to answer one part of your question, the value that ends up in the
cmdstream that the gpu sees is:

  ((bo->gpuaddr + offset) >> shift) | or

That lets us accommodate the various ways that a gpu addr ends up in
the cmdstream.  Ie. there are a handful of places where it is left or
right shifted by a few bits, or has some other flags OR'd in the low
bits (which would otherwise always be zero), etc.


But I'm guessing the other part of the question is "why reloc's"?  The
short version is that it gives gives the kernel more information for
memory management and gives it more room to play some nice tricks:

1) kernel knows *all* bo's referenced in cmdstream.. kernel is then
able to hold an extra reference to buffers referenced by in-flight
submits.  Userspace can always immediately free a buffer without
waiting (a *very* common pattern for x11 pixmaps,  vertex/texture
upload buffers, etc), without any free_at_timestamp type ioctl.  And
cleanup for a crashed process does not cause any GPU fault.

Also, since kernel knows when a bo is referenced (for read and/or
write access by gpu) it can implement fence stuff properly.  Yes, you
can do the fencing other ways.. but this approach doesn't have to
worry about userspace forgetting to tell the kernel about a some
buffer or another.

2) kernel can defer mapping (or possibly even allocating pages) for a
buffer until needed..  mapping to IOMMU is relatively quick[1], and
not every buffer allocated needs to be mapped to every piece of hw
(ie. if buffer is only used for scanout, or (hypothetically) only used
w/ 2d core, etc.  There certainly are places in the graphics/UI stack
where buffers/textures/etc get allocated because they *might* be used.

[1] the slow thing with mapping/unmapping appears to be TLB flush..
with some improvement to the linux iommu interface to add an explict
flush operation, and iommu_{map,unmap}_unflushed(), we could batch up
mappings for buffers, and map all the unmapped buffers at the time of
submit ioctl (rather than for each allocate ioctl).

3) I do have one device without a working IOMMU, so I use a physically
contiguous VRAM carveout.  But due to CMA vs highmem lolz (at least in
the 3.4 kernel) I end up needing the entire VRAM carveout in lowmem.
Which limits it to ~384MiB.  This is not enough to, for example, run
gnome-shell and xonotic at the same time.  If you have a swap file,
and userspace is managing buffers via handle rather than gpu addr, I
could in theory swap out unused buffers, and later swap them back in
at a different address, without confusing userspace.  Yeah, swapping
is going to suck for performance.  But it will be mostly swapping
gnome-shell's buffers, other window pixmaps, which are not needed when
the game is running fullscreen.

----

Managing gpu buffers by handle also enables some things that might be
useful some day.  For example, older snapdragon stuff (I don't think
I've seen this since a2xx days) potentially had fast stacked memory.
With kernel managing buffer addresses we could hypothetically do some
things like move frequently used buffers into fast memory,
transparently to userspace.  This case is a bit similar to VRAM in a
desktop GPU.  Maybe this sort of arrangement will not come back.  No
idea if qcom has plans are in this dept... I do know other SoC makers
have at least kicked around the idea of non-uniform memory (it makes a
lot of sense.. GPU and CPU's need different performance
characteristics out of memory).

BR,
-R

> Thanks,
>
> Aravind
>
>
> _______________________________________________
> Freedreno mailing list
> Freedreno at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/freedreno
>


More information about the Freedreno mailing list