[virglrenderer-devel] multiprocess model and GL

Fri Feb 7 07:04:18 UTC 2020

On Thu, Feb 6, 2020 at 7:55 PM Gurchetan Singh
<gurchetansingh at chromium.org> wrote:
>
> On Thu, Feb 6, 2020 at 3:01 PM Chia-I Wu <olvaffe at gmail.com> wrote:
> >
> > On Wed, Feb 5, 2020 at 7:52 PM Gurchetan Singh
> > <gurchetansingh at chromium.org> wrote:
> > >
> > > On Tue, Feb 4, 2020 at 8:14 PM Chia-I Wu <olvaffe at gmail.com> wrote:
> > > >
> > > > On Tue, Feb 4, 2020 at 7:17 PM Gurchetan Singh
> > > > <gurchetansingh at chromium.org> wrote:
> > > > >
> > > > > On Tue, Feb 4, 2020 at 1:04 PM Chia-I Wu <olvaffe at gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Feb 3, 2020 at 7:20 PM Gurchetan Singh
> > > > > > <gurchetansingh at chromium.org> wrote:
> > > > > > >
> > > > > > > On Fri, Jan 31, 2020 at 7:04 PM Chia-I Wu <olvaffe at gmail.com> wrote:
> > > > > > > > > So I wouldn't call the resource create v2 protocol as execbuffer.
> > > > > > > > I will say virgl_hw.h is a part of the virgl_protocol.h.  Together,
> > > > > > > > they define the virgl protocol.  If you look at the various _FORMAT
> > > > > > > > fields in virgl_protocol.h, they refers to the formats defined in
> > > > > > > > virgl_hw.h.
> > > > > > > >
> > > > > > >
> > > > > > > Disagree here -- virgl_hw.h is the source of truth.  It's possible to
> > > > > > > create a resource without including virgl_protocol.h, and we do it all
> > > > > > > the time.
> > > > > > >
> > > > > > > For example, without VIRGL_BIND_STAGING or
> > > > > > > virgl_supported_format_mask, the guest can't know which type of
> > > > > > > resources can be created.
> > > > > > >
> > > > > > > Without VIRGL_CAP_COMPUTE_SHADER, the guest can't know something like
> > > > > > > VIRGL_SET_SHADER_BUFFER_* is supported.  All defined in virgl_hw.h
> > > > > >
> > > > > > How are VIRGL_BIND_STAGING or VIRGL_CAP_COMPUTE_SHADER applied to virv
> > > > > > resources or execbuffer?  Will virgl_hw.h be extended in a way that it
> > > > > > provides unified caps and resource model that both virgl and virv
> > > > > > share?  Or will virgl_hw.h be extended in a way that virgl and virv
> > > > > > have very little in common?  If there were going to be two worlds
> > > > > > coexist in the extended virgl_hw.h, I wouldn't call it the source of
> > > > > > truth (two sources of two truths maybe?)
> > > > >
> > > > > Multiple sources are perfectly valid.  The virglrenderer protocol is a
> > > > > superset of all capset defined protocols.  capset 1 and capset 2
> > > > > (virgl_hw.h) only apply to virgl (context type 0).
> > > > >
> > > > > capset 3 version 1 can apply to context type 0
> > > > > capset 3 version 2 can apply to context type 1
> > > > > capset 4 version 1 can apply to context type 1
> > > > >
> > > > > Each capability defines its own protocol.  We can drop support for
> > > > > capset 1 and capset 2 once OpenGL is deprecated.
> > > > It sounds like there are 1 resource creation protocol (virgl_hw.h) and
> > > > 1 execbuffer protocol (virgl_protocol.h) for virgl support.  And you
> > > > want 2 resource creation protocols and 2 execbuffer protocols in total
> > > > after we add virv support, which are not what I had in mind when you
> > > > say extending....
> > >
> > > We can structure it in various ways; if you don't want capset 3,
> > > virvrenderer can define its resource creation protocol in capset 4
> > > along with the execbuffer protocol (like virgl defines both resource
> > > creation and execbuffer in capset {1, 2}).
> > >
> > > The number of protocols doesn't bother me (since they are cached) --
> > > is that your main concern?
> > No, no real concern, just confusions.  You brought up that there are a
> > resource creation protocol and a execbuffer procotol, and you extended
> > them.  I did not expect the end result to be two resource creation
> > protocols and two execbuffer protocols.
> >
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > > > > I think a more fair way to put this is that some of the virgl protocol
> > > > > > > > details are leaked by drm_virtgpu_resource_create.  IMO, a reasonable
> > > > > > > > way to plug the leak is to make sure the virgl protocol details are
> > > > > > > > contained in EXECBUFFER in resource create v2.
> > > > > > >
> > > > > > > I would put this way: both resource creation and the execbuffer
> > > > > > > protocol are defined by the capabilities, and the virglrenderer
> > > > > > > protocol is the superset of all capability-defined protocols.
> > > > > > >
> > > > > > > Honestly, the existing interface isn't too bad.  We just need to extend it ...
> > > > .... My view of the existing interface is that the virgl execbuffer
> > > > protocol (virgl_protocol.h) is good, but the resource creation
> > > > protocol (virgl_hw.h) is virgl-specific.  To extend them for virv, I
> > > > introduce a new command to the resource creation protocol
> > > >
> > > >   RESOURCE_CREATE_v2(ctx_id, obj_id)
> > >
> > > I think we are converging with something like memory-v4 (one ioctl,
> > > one kick, opaque arguments) and it supports our use cases, so I think
> > > we should continue iterating on that model rather than going somewhere
> > > different.
> > It is the RESOURCE_CREATE_BLOB command in memory-v4, just to be clear.
> >
> > >
> > > The questions in my mind with memory-v4 where:
> > >
> > > 1) VIRTGPU_RESOURCE_FLAG_ALLOC_EXECBUFFER vs.
> > > VIRTGPU_RESOURCE_FLAG_ALLOC_{3D, ARGS} -- since in Virgl, execbuffer
> > > is used for (pipe context) rendering commands commands and not (pipe
> > > screen) allocations, I'd tend to support the latter.  Do you
> > > agree/disagree?
> > I think virgl execbuffer can be extended to support (pipe screen)
> > allocations.  virgl execbuffer is also used to for (pipe screen)
> > context creations so I don't see it as a layering violation.
>
> With alloc_param.virgl, there's no need to extend virgl execbuffer?

Allocation-wise, yes (and it has always been on my branches...).  It
is modeled after v1, which is modeled after gallium.  I don't see any
problem.

The problem is hostmem pixel layout (stride, etc.).  Unless the
userspace can query it from elsewhere (*cough* execbuffer), it can
only be used for blobs or non-mappable images.  But there is no
mappable images in GL anyway.

On the other hand, GBM has gbm_bo_map.  GBM can stay zero copy until
an app tries to map a 2D GBM bo... Hmm, maybe it is not that useful
after all?

>
> >
> > > 2) SHARED_ALLOW vs SHARED_REQUIRE --  SHARED_REQUIRE is the way to go.
> > > Maybe just HOSTVMM_3D since VMM creates the dma-buf?
> > > 3) {GET_HOST_BACKING, RELEASE_HOST_BACKING} vs. {ATTACH_HOST_BACKING,
> > > DETACH_HOST_BACKING} vs. {RESOURCE_MAP, RESOURCE_UNMAP} -- something
> > > like the first two seem nice, since they mirror what we have for guest
> > > memory and we want to do want to batch hypercalls.  Thoughts?
> > >
> > > The questions you had were:
> > >
> > > 1) Limiting the 3D create protocol -- I agree, I think it's reasonable
> > > to limit the number of bytes (of course versioning that limit will be
> > > nice)
> > I think RESOURCE_CREATE_BLOB covers all needs I can see.  If I were
> > wrong, VIRTGPU_RESOURCE_FLAG_ALLOC_MASK could be extended to fix my
> > mistake.  I am not too eager to give up all controls to userspace by
> > providing args[] and only args[].
>
> Are you concerned for security reasons?  The host is supposed to
> validate all incoming commands, and the DRM_VIRTGPU_EXECBUFFER (which
> will also allocate) command stream will be much larger and harder to
> validate.
>
> Userspace already has much control, and for v1 resource creation it's
> already defined by the capabilities.
Not security.  If the kernel believes a minimal RESOURCE_CREATE_BLOB
combined with the execbuffer ioctl can provide all necessary
functionality, why give the userspace even more freedom and risk
painting itself into a corner?

Speaking for myself, I am uncertain if RESOURCE_CREATE_BLOB+execbuffer
is enough, and wonder if the kernel should give up and go with args[].
That's why there is VIRTGPU_RESOURCE_FLAG_ALLOC_MASK in case
execbuffer is not enough.  If you have good use cases that are
impossible with execbuffer, you should bring them up.

> >
> > On the other hand, I can see that it is tempting to provide args[] and
> > leave this all to userspace.  It is possible that some years later, we
> > finally see why args[] is the way to go (and we extend
> > VIRTGPU_RESOURCE_FLAG_ALLOC_MASK).  But, IMHO, we are not there yet.
> >
> > > 2) How to define protocols -- I think the answer here is the capabilities set.
> > >
> > > Are there other outstanding questions you have??
> > >
> > > > This will be the core resource creation command going forward. It can
> > > > create resources from context-local objects.  Once virgl_protocol.h is
> > > > extended to support local allocations, it can be used with
> > > > virgl_protocol.h to replace the v1 resource creation command.
> > >
> > > As someone (the only one?) who's looked into the virglrenderer
> > > implementation, I think it's easier to extend the current model rather
> > > than requiring object IDs for Virgl.  It's just the way the code is
> > > structured, and it's structured that way on based the Gallium model.
> > > That model is completely valid and with Zink not going away anytime
> > > soon, and I think virv could require it too.
> > >
> > > >
> > > > The new command is independent from execbuffer protocols.  When virv
> > > > execbuffer protocol is added, the core resource command can be used
> > > > with virv execbuffer protocol automatically.  There will be 1 resource
> > > > creation protocol and 2 execbuffer protocols.
> > > >
> > > >
> > > > > > >
> > > > > > > [snip]
> > > > > > > > I think that gives a very pure resource model and resource API, but
> > > > > > > > for no practical benefit other than pureness.  Keeping
> > > > > > > > VIRTGPU_RESOURCE_FLAG_ALLOC_MASK and adding
> > > > > > > > VIRTGPU_RESOURCE_FLAG_ALLOC_CLASSIC (and adding
> > > > > > > > drm_virtgpu_resource_create to the union) seem like a good compromise
> > > > > > > > to me.  The guest userspace can use the classic alloc type to allocate
> > > > > > > > a "classic" host storage.  The host can choose to allocate using gbm
> > > > > > > > or other apis.
> > > > > > >
> > > > > > > A pure resource model and API is a good thing!  Honestly, I'd prefer
> > > > > > > we not bikeshed over style too much.  We have an implementation that
> > > > > > > doesn't require something like ALLOC_CLASSIC but solves the same
> > > > > > > problem, so why not go with that?
> > > > > >
> > > > > > It has something like VIRTGPU_RESOURCE_FLAG_ALLOC_ARGS and leaves
> > > > > > everything to the userspace (to fill out args[]).  In fact, the alloc
> > > > > > type can be encoded in the first byte of args[] and there will be no
> > > > > > need for VIRTGPU_RESOURCE_FLAG_ALLOC_MASK.  I won't miss any
> > > > > > functionality either because I can, for example, encode
> > > > > > VIRTGPU_RESOURCE_FLAG_ALLOC_EXECBUFFER in args[] first, encode object
> > > > > > id next, and then encode the command stream.
> > > > >
> > > > > Yeah, if you want to make the one protocol (caps 3, v2) reference
> > > > > another protocol (caps 4), that's possible (if they're both
> > > > > supported).
> > > > >
> > > > > >
> > > > > > Like I said, I am not entirely against args[] as long as it supports
> > > > > > my use cases (and it does).  What I am not sure about is how wild do
> > > > > > we allow args[] to be?  The kernel does not care so the userspace can
> > > > > > do whatever it wants?
> > > > >
> > > > > Complex rendering commands while creating a virtio-gpu resource are
> > > > > not desired.  It's reasonable to limit the maximum size of the
> > > > > resource create protocol to something like 64 bytes.
> > > > >
> > > > > > That is one extreme.  The other extreme is to
> > > > > > say that the kernel only accepts execbuffer so args[] is actually
> > > > > > cmds[]+object_id.  That may seem limiting, but that could work too
> > > > > > because you could put whatever would go into args[] into cmds[].
> > > > > >
> > > > > > Why not go with args[]?  There is cmds[] that the userspace can do
> > > > > > whatever it wants already, including allocations.  Why add another
> > > > > > mechanism that is allocation-only?
> > > > >
> > > > > Because the new ioctl is related to resource creation.  It makes sense
> > > > > to limit it to that purpose.  We have the capabilites -- let's use
> > > > > them.
> > > >
> > > > I see driver allocation and resource creation (RESOURCE_CREATE_v2
> > > > above) as independent steps.
> > >
> > > Reasonable minds differ here, and so does userspace.  I think the
> > > right answer is support allocation in RESOURCE_CREATE_{BLOB, V2} and
> > > support allocations via EXECBUFFER.
> > >
> > > > The fact that it is possible to do both
> > > > in one ioctl is a kernel trick to reduce context switches.  I also
> > > > have use cases where I will do execbuffer and resource create v2
> > > > separately.
> > > >
> > > > Caps can be used to decide whether virgl or virv are supported, or
> > > > whether virgl supports local allocations, or the resource creation
> > > > protocol supports RESOURCE_CREATE_v2 above.  What do you mean by not
> > > > using them?
> > >
> > > I want the capsets to version the resource create_{blob, v2} function
> > > at the host/guest userspace levels.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > > >
> > > > > > Separating args[] out from cmds[]
> > > > > > gives some more flexibility, but I am not sure if the flexibility is
> > > > > > justified.
> > > > >
> > > > > There are many workflows and use cases.  Flexibility and automatic
> > > > > versioning is good to have.  Using EXECBUFFER for virgl is not ideal,
> > > > > and since it's only called when glFlush() / glFinish() by Mesa, it
> > > > > really currently models fenced command buffer submission:
> > > > >
> > > > > https://github.com/torvalds/linux/blob/master/include/uapi/drm/i915_drm.h#L939
> > > > > https://github.com/torvalds/linux/blob/master/include/uapi/drm/i915_drm.h#L1049
> > > > > https://github.com/torvalds/linux/blob/master/include/uapi/drm/msm_drm.h#L180
> > > > > https://github.com/torvalds/linux/blob/master/include/uapi/drm/msm_drm.h#L231
> > > > > https://github.com/torvalds/linux/blob/master/include/uapi/drm/panfrost_drm.h#L49
> > > > > https://github.com/torvalds/linux/blob/master/include/uapi/drm/vc4_drm.h#L91
> > > > > ....
> > > > >
> > > > > We're trying to emulate GEM_CREATE...
> > > > GEM_CREATE is emulated with RESOURCE_CREATE_v2 above which is pretty
> > > > simple and straightforward.  EXECBUFFER is an extra step inserted
> > > > before GEM_CREATE due to the fact that the host requires vastly
> > > > different parameters to allocate depending on the driver APIs.
> > > >
> > > > For comparison, GEM_CREATE is also emulated with RESOURCE_CREATE v1.
> > > > It does not need the extra step because it is kind of virgl-specific
> > > > whose allocation parameters are known and fixed.
> > > >
> > > >
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > > > [snip]
> > > > > > > > The main thing that model misses is that VK renderers want to make
> > > > > > > > local allocations when they are not to be mapped nor shared.  If that
> > > > > > > > model did allow local allocations, would VK renderers use objid to
> > > > > > > > identify local allocations and use resid to identify resources?
> > > > > > >
> > > > > > > Local allocations are fine.  Using DRM_IOCTL_VIRTGPU_EXECBUFFER to
> > > > > > > allocate and then doing DRM_VIRTGPU_RESOURCE_CREATE_BLOB with
> > > > > > > something like VIRTGPU_STRUCTURE_TYPE_BIND_OBJECT_ID with the variadic
> > > > > > > arguments when needed should work, no?
> > > > > > Yes.  I was responding to Dave about the classic model missing the
> > > > > > need for local allocations.  I believe args[] gets it and all other
> > > > > > needs covered.
> > > > > >
> > > > > >
> > > > > > > [snip]
> > > > > > > > That model also encourages implicit exports/imports.  It does not want the
> > > > > > > > guest to know that the centralized allocator might export and the
> > > > > > > > renderers might import.  But when the renderers do not have enough
> > > > > > > > info to import, or when the renderers want to associate an objid with
> > > > > > > > a resid, they rely on the guest to explicitly import again.
> > > > > > >
> > > > > > > There's many ways to import/export.  I pointed a way using
> > > > > > >
> > > > > > > VIRTIO_GPU_CMD_CTX_ATTACH_RESOURCE
> > > > > > > DMA_BUF_SET_NAME/ DMA_BUF_GET_NAME
> > > > > > > MODIFY_INCOMING_IMPORT/MODIFY_INCOMING_EXPORT
> > > > > > >
> > > > > > > earlier in this thread.  I think we should hold off that discussion
> > > > > > > because the proposed model doesn't necessarily require implicit
> > > > > > > export/import, though IMO it's not a huge issue.
> > > > > > >
> > > > > > > [snip]
> > > > > > > > I think Gurchetan started from that model and fixed all the issues.
> > > > > > > > As a result, the allocation parameters are in a new opaque args[]
> > > > > > > > rather than in the existing opaque cmd[].  I like cmd[] more because
> > > > > > > >
> > > > > > > >  - I use execbuffer in VK renderers to make local allocations and
> > > > > > > > import/export resources already, and
> > > > > > > >  - I did not start from that model; instead I view execbuffer as the
> > > > > > > > way to pass opaque parameters
> > > > > > >
> > > > > > > It makes sense to continue building on the current model, where the
> > > > > > > execbuffer protocol is defined by DRM_IOCTL_VIRTGPU_EXECBUFFER, and
> > > > > > > not necessarily the only way to submit opaque arguments.  For
> > > > > > > virglrenderer, the code paths for resource creation and command buffer
> > > > > > > submission are a bit different, so it's nice to keep it that way going
> > > > > > > forward.
> > > > > > >
> > > > > > > Note: allocating memory via DRM_IOCTL_VIRTGPU_EXECBUFFER and creating
> > > > > > > a virtio-gpu resource out of it later is fine.
> > > > > > >
> > > > > > > [snip]
> > > > > > > > It is also possible to treat the non-centralized allocator specially
> > > > > > > > and treat it as a global entity.  This allows the userspace to open
> > > > > > > > the DRI device once, but talk to its corresponding renderer as well as
> > > > > > > > the allocator at the same time.  I am not sure if this is useful to
> > > > > > > > the guest.  How to pass the opaque allocation parameters is also to be
> > > > > > > > resolved..
> > > > > > >
> > > > > > > Are you concerned about protocol collision?  That is, one protocol for
> > > > > > > EXECBUFFER and one protocol for resource creation?
> > > > > > >
> > > > > > > I think it's very solvable, using the capabilities.  Let's say capset
> > > > > > > 3 (due to the CAPSET_QUERY_FIX) defines the proposed resource creation
> > > > > > > protocol.
> > > > > > >
> > > > > > > capset 3, version 1 defines it for context type 0
> > > > > > > capset 3, version 2 defines it for context type 1
> > > > > > > capset 3, version 3 defines it for context type 2
> > > > > > >
> > > > > > > Similarly, the EXECBUFFER for a specific context be defined by capset
> > > > > > > 4 + i.  There are many ways to do it ... the execbuffer method to
> > > > > > > allocate can indeed be the resource create method to allocate,
> > > > > > > depending on the context type.
> > > > > > >
> > > > > > > Again, I want to stress the host capabilities define any protocol, not
> > > > > > > necessarily the context type.  The context type defines where the
> > > > > > > capabilities are fetched.
> _______________________________________________
> virglrenderer-devel mailing list
> virglrenderer-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/virglrenderer-devel