[virglrenderer-devel] multiprocess model and GL

Mon Jan 27 19:33:41 UTC 2020

I updated

  https://gitlab.freedesktop.org/virgl/drm-misc-next/commits/olv/resource-v2

The highlights are

 - has VIRTGPU_RESOURCE_NO_KICK to disable kicks in resource create
 - single ioctl to allocate for simple cases (e.g., when only size is
needed to allocate the host storage)
 - two ioctls to allocate for complex cases (requires a separate execbuffer)
 - a second commit to satisfy Gurchetan's needs

I hope the second commit captures the difference between my version
and Gurchetan's.  For cases that are neither simple nor too complex,
the host allocation args can be inlined to avoid a separated
execbuffer.

On Mon, Jan 27, 2020 at 10:55 AM Chia-I Wu <olvaffe at gmail.com> wrote:
>
> On Mon, Jan 27, 2020 at 3:38 AM Gerd Hoffmann <kraxel at redhat.com> wrote:
> >
> > On Fri, Jan 24, 2020 at 10:08:39AM -0800, Chia-I Wu wrote:
> > > On Fri, Jan 24, 2020 at 1:29 AM Gerd Hoffmann <kraxel at redhat.com> wrote:
> > > >
> > > > > Unless we have a strong feeling
> > > > > that requiring two virtio commands to create a vk resource are too
> > > > > many, we can worry about it after we have a prototype and some
> > > > > numbers.
> > > >
> > > > Well, performance-wise the number of virtio commands doesn't matter
> > > > much.  The number of context switches does.
> > > Yeah, two virtio commands here imply two ioctls and two vq kicks.
> >
> > Latest kernel driver has support for enabling and disabling vq kicks.
> > Right now it's used only to batch page-flip virtio commands (i.e. submit
> > multiple virtio commands, notify only after the last one so they all are
> > processed with a single vmexit).
> >
> > Doing the same for multiple virtio commands triggered by a single ioctl
> > is trivial, i.e. a LIST ioctl could easily submit a bunch of virtio
> > commands (one for each resource) and notify the host only after
> > submitting the last one so they all are processed in one go.
> >
> > Doing it across multiple ioctls is a bit more effort but
> > should be doable too.
> If this can be made to work, I am in favor of this approach.
>
> It is the VIRTGPU_RESOURCE_NO_KICK flag in my branch.
>
> >
> > > > So, for INIT_BLOB, we might consider to use an *array*.  So you can
> > > > initialize a bunch of resources with a single ioctl.  Then pack all
> > > > allocation commands into a single execbuffer and run that.  Voila,
> > > > you can initialize a bunch of resources with only two system calls.
> > > The userspace does not usually allocate buffers in batches though.
> > > The array version can be useful if the userspace can create a bunch of
> > > size-less resources first and provide the sizes later.
> >
> > Hmm, then we would need a separate ioctl to initialize the gem object,
> > i.e. the workflow would be:
> >
> >   GETID_BLOB
> >   alloc via execbuffer
> >   INIT_BLOB(resource-id,size)
> >
> > One more context switch.  On the other hand we don't need to know the
> > size beforehand, so we should be able to skip the extra metadata query
> > step then.  Also INIT_BLOB can be done lazily.  Not sure how much of the
> > win that actually would be in practice, we would avoid the ioctl only in
> > case userspace creates exportable memory without actually exporting it.
>
> Yeah, the flow will be
>
>   IOCTL_GETID_BLOB -> CMD_GETID_BLOB -> vq kick
>   IOCTL_EXECBUFFER -> CMD_SUBMIT_3D -> vq kick
>   IOCTL_INIT_BLOB -> no cmd nor vq kick, unless it needs to attach the
> guest shmem backing
>
> IOCTL_GETID_BLOB is amortized and we can have amortized one vm exit
> per resource allocation, unless we attach guest shmem a lot.
>
> I am in favor of the flow below if it can work
>
>   IOCTL_RESOURCE_INIT_BLOB -> CMD_RESOURCE_INIT_BLOB -> no kick
>   IOCTL_EXECBUFFER -> CMD_SUBMIT_3D -> vq kick
>
> There will always be one vm exit per resource allocation.
>
> >
> > > > > gbm resources here, and dumb resources, udmabuf resources, and classic
> > > > > resources belong to main process allocations.
> > > >
> > > > Not sure about gbm.  But, yes, dumb + classic are global.
> > > Yeah, gbm objects can use the init_blob+execbuffer trick as well.  The
> > > host does not necessarily need to run the gbm context in a per-context
> > > process.
> >
> > What would be in the execbuffer then for gbm?
> >
> > Right now execbuffer is defined as "commands which virglrenderer
> > understands", but that wouldn't work for gbm allocations ...
> GBM needs to be the third kind of contexts.  The guest userspace and
> virglrenderer defines the wire format for its execbuffer.
>
> >
> > > That implies yet another kind of contexts.  I guess the kernel can
> > > provide a SET_FEATURE (or INIT_CONTEXT) ioctl with an opaque `__u32
> > > kind' parameter.  The parameter decides the wire format of execbuffer
> > > which the kernel does not care.
> >
> > Not fully sure about that.  I had the impression that Gurchetan Singh
> > wants gbm allocations in gl/vk contexts too.
> We have execbuffer to allocate and export gl or vk objects in the
> host.  The guest gbm can use them.
>
> I think he is looking for a simpler path that the guest gbm can use,
> and the host can support using any (gl, vk, or gbm) driver.  Having a
> third kind of context with super simple wire format for execbuffer
> satisfies the needs.
>
> Otherwise, the other options are 1) using the classic resource create,
> 2) adding args[] to init_blob, or 3) adding alloc_gbm ioctl (but with
> a more neutral name).
>
>
>
>
>
> >
> > > > >  - ALLOC_BLOB: allocate a blob storage in host main process
> > > > >  - ALLOC_TYPED: allocate a typed storage in host main process
> > > > >  - ALLOC_IOVEC: allocate a udmabuf from guest iovec in host main process
> > > >
> > > > size is needed for INIT.
> > > > storage too (the kernel needs to know whenever it should allocate guest
> > > > memory pages or reserve some address space in the vram bar).
> > > If the kernel can provide CREATE_BLOB_LIST, to create a list of
> > > id-only gem bos first, and INIT_BLOB, to initialize an individual gem
> > > bo, it can be very useful.  CREATE_BLOB_LIST generate a virtio
> > > command, but the overhead is amortized.  INIT_BLOB generates the
> > > ATTACH_BACKING command only when a shmem is requested.
> > >
> > > > ALLOC will trigger *host* allocations.
> > > Exactly.
> > >
> > > If we are ok with using execbuffer for gbm objects, we might as well
> > > say that when a ALLOC_FOO is needed and requires any argument,
> > > execbuffer should be used instead.  Then ALLOC_BLOB or ALLOC_IOVEC can
> > > be merged into INIT_BLOB
> > >
> > > struct drm_virtgpu_resource_init_blob {
> > > __u64 size;         /* in */
> > > __u32 flags;        /* in */
> > > __u32 host_flags; /* in, no host flag defined yet */
> > > __u32 host_alloc; /* in, NONE, ALLOC_BLOB or ALLOC_IOVEC */
> > > __u32 bo_handle;    /* out */
> > > __u32 res_handle;   /* out */
> > > __u32 pad;
> > > };
> >
> > Yep, for the simple case we can do it in a single ioctl.
> >
> > Execbuffer allocations still need three: GETID + execbuffer + INIT.
> > Where GETID can have a LIST variant and execbuffer can probably do
> > multiple allocations in one go too.  INIT needs to wait for execbuffer
> > to finish to get the object size.
> The userspace must know the size before it can allocate with
> execbuffer.  INIT can get the size from the userspace.
>
> >
> > > > >  - INIT_BLOB+shmem+ALLOC_TYPED is equivalent to the classic RESOURCE_CREATE
> > > > >  - INIT_BLOB+shmem+ALLOC_IOVEC is equivalent to RESOURCE_CREATE_SHARED
> > > >
> > > > BTW: I'm not sure any more RESOURCE_CREATE_SHARED is that a great plan.
> > > Not exactly RESOURCE_CREATE_SHARED, but the mechanism can be used to
> > > achieve something similar to DRM_IOCTL_I915_GEM_USERPTR in the future.
> >
> > Is this what it sounds like, i.e. create a gem object from (guest)
> > userspace allocated memory?  If so this should be easy to do, virtio-gpu
> > objects are are just guest memory pages after all.  So we should be able
> > to pin them via get_user_pages() and run with them ...
> Yes.
>
> >
> > > > I think it's more useful to integrate that into to new INIT/ALLOC BLOB
> > > > ioctls/virtio commands.
> > > Sure, as long as the host can distinguish them (to create a new host
> > > storage for dumb or to create a udmabuf from iovec).
> >
> > Why does the host need to know?  The host does need (a) the list of
> > pages (sent via attach_backing), and (b) a bool which tells whenever it
> > is ok to directly access the backing storage without explicit transfer
> > commands send by the guest.
> (b) distinguishes the two, no?
>
> When (b) is true, the host creates a udmabuf from the iovec for direct
> access.  When (b) is false, the host allocates a dumb buffer and
> requires transfers.
>
> >
> > > > > Yeah, it makes more sense for the guest kernel to manage the address
> > > > > space.  Thanks.  I mixed this up with shared memfd heap above where I
> > > > > think guests need to get offsets from host (maybe I am wrong there
> > > > > too?)
> > > >
> > > > It's the hosts job to manage the heap and the offsets, the guest doesn't
> > > > need to know.  The guest just asks for this (host) resource being mapped
> > > > at that place (in guest address space).
> > > It is for COQOS hypervisor which cannot remap allocations to the
> > > guest-requested addresses, as I was told.  The guests must get the
> > > offsets (into this dedicated heap) from the host.
> >
> > Hmm.  I'm wondering how it'll go manage dynamic gpu allocations at
> > all ...
> It does not support injecting gpu allocations into the guests.
>
> >
> > cheers,
> >   Gerd
> >