[virglrenderer-devel] coherent memory access for virgl

Fri Oct 5 03:40:10 UTC 2018

On Thu, Oct 4, 2018 at 1:57 AM Gerd Hoffmann <kraxel at redhat.com> wrote:
>
>   Hi,
>
> > > > Yes, it stands for ARM frame buffer compression and it's widely used
> > > > on that architecture.
> > >
> > > Speaking of compression:  How does buffer size calculation and buffer
> > > allocation work in case compression is used?
> >
> > There's a header + compressed data or an auxiliary buffer.  Here are
> > some examples:
> >
> > https://chromium.googlesource.com/chromiumos/platform/minigbm/+/master/rockchip.c#33
>
> Hmm, so you need some format-specific knowledge to calculate the buffer
> size.  How does the allocation workflow look like?  Userspace (libgbm)
> calculates the size, then asks the kernel create a bo with that size?

Correct.

> Related: how is the stride calculated?

EGL_EXT_image_dma_buf_import says stride "may have special meaning for
non-linear formats", but most the time it's calculated in terms of
block width.

Since compressed buffers may need to be de-tiled, the stride used by
the display (in drmModeAddFB2) may differ from the stride used when
mapping (see crrev.com/c/1041369 for an example).  We need to take
this into account.

> I'm asking because I want get the workflow right for hostmapped
> resource allocation.  Right now we have no modifiers, so it is
> simple to calculate the size needed for a given resource.  Also given we
> have no shared mapping between host and guest and copy around the data,
> so any stride mismatches can be accounted for when copying.  Both will
> not be true any more when we add hostmap and modifier support ...
>
> So the fundamental question is:  Can the guest calculate the size needed
> in advance?

Without knowing which host it's running on, the guest can't know.

> Or will the guest have to first ask the host create a
> resource, then query how big it is, finally map the thing into the guest
> address space?

That's right -- though the "compressed size" and the "linear size" may
be different.

The proposed mechanism on the host for mapping is gbm_bo_map(..),
which does de-tiling and returns a map-time stride.  We could emulate
those semantics.  The size given to mmap on the guest would be
(box_height * map_stride).

It's possible the protocol doesn't need to be amended.  We call
TRANSFER_FROM_HOST before we map the buffer, which potentially allows
us a way to calculate the correct mmap() size:

https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/virgl/virgl_texture.c#n192

But we comment out the stride in TRANSFER_FROM_HOST / TRANSFER_TO_HOST:

https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/winsys/virgl/drm/virgl_drm_winsys.c#n288

In theory, since it's never been used, we can define the stride to be
an output of the ioctl.  We can add even more flags to
DRM_VIRTGPU_RESOURCE_INFO (i.e, TRANSFER_STRIDE_DIFFERENT) since for
most host buffers map_stride == compressed_stride, so we can avoid
vm-exits associated with TRANSFER_FROM_HOST / TRANSFER_TO_HOST when
only need to mmap().  Or we can extend the protocol.

> Related:  Who should decide which format modifiers should be used?
> guest or host?
>
> > https://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/drivers/dri/i965/intel_screen.c#n735
>
> What is in the auxiliary buffer?  The compression metadata (i.e. simliar
> to what afbc has in the header)?
>
> How they are passed around?  Is the auxiliary buffer just another bo?

Yes, the auxiliary buffer is just another buffer, which is passed
around as another dma-buf.

Some tiling methods don't need headers or auxiliary buffers, -- the
modifier is sufficient.

>
> > > How does mesa handle modifiers?  Does it use the drm modifiers directly?
> > > Or has mesa/gallium its own naming scheme?
> > >
> > > Right now the format specification used by virtio is based on gallium
> > > formats ...
> >
> > The modifiers are from DRM, and drm_fourcc.h is considered the source
> > of truth (I guess that's where some of the problems with v4l2
> > integration arise).
>
> Ok.
>
> > Anyways, here are the conclusions that I can discern from this discussion:
> >
> > 1) Host mapped memory should be fine, as long it's optional for the VMM
>
> Yes, we need a virtio feature flag for this.  First, for backward
> compatibilty reasons.  Second, because the VMM might want to make this
> configurable per VM.

Sounds good.

>
> > 2) Texture uploads can be improved.  BTW, what's the benefit of
> > udmabuf over an iov with one element for say a 5-level texture?
>
> Well, udmabuf can be used to get a linear mapping of a scattered
> resource.  Now qemu can map that udmabuf, the pass on the one-elem-iov
> to virglrenderer.

Oh, udmabuf is involved in both methods.  I was thinking QEMU could
allocate an one-element iovec somehow, but I guess that's not
possible.

> Or qemu can pass the udmabuf handle to virglrenderer.
>
> The later has the advantage that virglrenderer can possibly do more with
> the dmabuf than simply mmap()ing it (specifically let the gpu driver
> import it).  The former has the advantage that it works without
> virglrenderer changes.

Remember, we can't import multi-level textures into GL.

For single-level linear textures, there's the issue of stride /
alignment -- we can't assume bytes-per-pixel * width will work.  I've
seen EGL_EXT_image_dma_buf_import fail when every plane isn't 64-byte
aligned, for example.

>
> > 3) Coherent memory ioctl needed for Vulkan/gl4.5
>
> Yes.
>
> > 4) The new ioctl should take into account format modifiers
>
> Well, it isn't just the ioctl.  We also have to hash out the virtio
> protocol extension (even though it'll probably look simliar to the
> ioctl interface), and the virglrenderer api extension.
>
> > 5) The DRM_VIRTGPU_RESOURCE_INFO needs to be fixed and expanded for
> > (1), (3) and (4)
>
> Not just that one, we need a new/improved resource create ioctl too.
> And while being at it we might want split the thing, into an ioctl to
> create buffers (aka gem objects), and into one (or more) ioctls to
> specifiy what is in there (format, width + height, modifiers, ...).

Sounds good.

>
> > 6) There's no clear path forward on how to pipe host display
> > information to the guest, such that the guest can know which modifier
> > is the best.  Hopefully, the wayland proxying Collabora is working on
> > can clear this up..
>
> Ah, so you think the guest should pick the modifier then?
>
> Guess we could use a capset to provide a list (ordered by preference)
> to the guest.
>

I assume Linux user-space won't know it's virtualized.  It'll query
KMS (virtio-kms), it'll query EGL (virglrenderer), and then pass the
union of supported modifiers to gbm_bo_create_with_modifiers (which
goes on to the virgl guest DRI interface).  For EGL, the list of
modifiers won't change, so the capset sounds good.

But for KMS, I can imagine scenarios where it can change.  For my
setup, guest kms also advertises only one primary plane (XR24 AR24
BX24 BA24 RX24 RA24 XB24 AB24) all the time.  If the virtio-kms driver
can change what it advertises, and userspace queries KMS based on
resize-events, maybe it can work.

We could in theory even support overlays inside the guest...

>
> cheers,
>   Gerd
>