[RfC PATCH] Add udmabuf misc device

Mon Apr 16 08:22:06 UTC 2018

On 04/16/2018 10:43 AM, Daniel Vetter wrote:
> On Mon, Apr 16, 2018 at 10:16:31AM +0300, Oleksandr Andrushchenko wrote:
>> On 04/13/2018 06:37 PM, Daniel Vetter wrote:
>>> On Wed, Apr 11, 2018 at 08:59:32AM +0300, Oleksandr Andrushchenko wrote:
>>>> On 04/10/2018 08:26 PM, Dongwon Kim wrote:
>>>>> On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko wrote:
>>>>>> On 04/06/2018 09:57 PM, Dongwon Kim wrote:
>>>>>>> On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote:
>>>>>>>> On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
>>>>>>>>>      Hi,
>>>>>>>>>
>>>>>>>>>>> I fail to see any common ground for xen-zcopy and udmabuf ...
>>>>>>>>>> Does the above mean you can assume that xen-zcopy and udmabuf
>>>>>>>>>> can co-exist as two different solutions?
>>>>>>>>> Well, udmabuf route isn't fully clear yet, but yes.
>>>>>>>>>
>>>>>>>>> See also gvt (intel vgpu), where the hypervisor interface is abstracted
>>>>>>>>> away into a separate kernel modules even though most of the actual vgpu
>>>>>>>>> emulation code is common.
>>>>>>>> Thank you for your input, I'm just trying to figure out
>>>>>>>> which of the three z-copy solutions intersect and how much
>>>>>>>>>> And what about hyper-dmabuf?
>>>>>>> xen z-copy solution is pretty similar fundamentally to hyper_dmabuf
>>>>>>> in terms of these core sharing feature:
>>>>>>>
>>>>>>> 1. the sharing process - import prime/dmabuf from the producer -> extract
>>>>>>> underlying pages and get those shared -> return references for shared pages
>>>>> Another thing is danvet was kind of against to the idea of importing existing
>>>>> dmabuf/prime buffer and forward it to the other domain due to synchronization
>>>>> issues. He proposed to make hyper_dmabuf only work as an exporter so that it
>>>>> can have a full control over the buffer. I think we need to talk about this
>>>>> further as well.
>>>> Yes, I saw this. But this limits the use-cases so much.
>>>> For instance, running Android as a Guest (which uses ION to allocate
>>>> buffers) means that finally HW composer will import dma-buf into
>>>> the DRM driver. Then, in case of xen-front for example, it needs to be
>>>> shared with the backend (Host side). Of course, we can change user-space
>>>> to make xen-front allocate the buffers (make it exporter), but what we try
>>>> to avoid is to change user-space which in normal world would have remain
>>>> unchanged otherwise.
>>>> So, I do think we have to support this use-case and just have to understand
>>>> the complexity.
>>> Erm, why do you need importer capability for this use-case?
>>>
>>> guest1 -> ION -> xen-front -> hypervisor -> guest 2 -> xen-zcopy exposes
>>> that dma-buf -> import to the real display hw
>>>
>>> No where in this chain do you need xen-zcopy to be able to import a
>>> dma-buf (within linux, it needs to import a bunch of pages from the
>>> hypervisor).
>>>
>>> Now if your plan is to use xen-zcopy in the guest1 instead of xen-front,
>>> then you indeed need to import.
>> This is the exact use-case I was referring to while saying
>> we need to import on Guest1 side. If hyper-dmabuf is so
>> generic that there is no xen-front in the picture, then
>> it needs to import a dma-buf, so it can be exported at Guest2 side.
>>>    But that imo doesn't make sense:
>>> - xen-front gives you clearly defined flip events you can forward to the
>>>     hypervisor. xen-zcopy would need to add that again.
>> xen-zcopy is a helper driver which doesn't handle page flips
>> and is not a KMS driver as one might think of: the DRM UAPI it uses is
>> just to export a dma-buf as a PRIME buffer, but that's it.
>> Flipping etc. is done by the backend [1], not xen-zcopy.
>>>    Same for
>>>     hyperdmabuf (and really we're not going to shuffle struct dma_fence over
>>>     the wire in a generic fashion between hypervisor guests).
>>>
>>> - xen-front already has the idea of pixel format for the buffer (and any
>>>     other metadata). Again, xen-zcopy and hyperdmabuf lack that, would need
>>>     to add it shoehorned in somehow.
>> Again, here you are talking of something which is implemented in
>> Xen display backend, not xen-zcopy, e.g. display backend can
>> implement para-virtual display w/o xen-zcopy at all, but in this case
>> there is a memory copying for each frame. With the help of xen-zcopy
>> the backend feeds xen-front's buffers directly into Guest2 DRM/KMS or
>> Weston or whatever as xen-zcopy exports remote buffers as PRIME buffers,
>> thus no buffer copying is required.
> Why do you need to copy on every frame for xen-front? In the above
> pipeline, using xen-front I see 0 architectural reasons to have a copy
> anywhere.
>
> This seems to be the core of the confusion we're having here.
Ok, so I'll try to explain:
1. xen-front - produces a display buffer to be shown at Guest2
by the backend, shares its grant references with the backend
2. xen-front sends page flip event to the backend specifying the
buffer in question
3. Backend takes the shared buffer (which is only a buffer mapped into
backend's memory, it is not a dma-buf/PRIME one) and makes memcpy from
it to a local dumb/surface
4. Backend flips that local dumb buffer/surface

If I have a xen-zcopy helper driver then I can avoid doing step 3):
1) 2) remain the same as above
3) Initially for a new display buffer, backend calls xen-zcopy to create
a local PRIME buffer from the grant references provided by the xen-front
via displif protocol [1]: we now have handle_zcopy
4) Backend exports this PRIME with HANDLE_TO_FD from xen-zcopy and imports
it into Weston-KMS/DRM or real HW DRM driver with FD_TO_HANDLE: we now have
handle_local
5) On page flip event backend flips local PRIME: uses handle_local for flips
>
>>> Ofc you won't be able to shovel sound or media stream data over to another
>>> guest like this, but that's what you have xen-v4l and xen-sound or
>>> whatever else for. Trying to make a new uapi, which means userspace must
>>> be changed for all the different use-case, instead of reusing standard
>>> linux driver uapi (which just happens to send the data to another
>>> hypervisor guest instead of real hw) imo just doesn't make much sense.
>>>
>>> Also, at least for the gpu subsystem: Any new uapi must have full
>>> userspace available for it, see:
>>>
>>> https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements
>>>
>>> Adding more uapi is definitely the most painful way to fix a use-case.
>>> Personally I'd go as far and also change the xen-zcopy side on the
>>> receiving guest to use some standard linux uapi. E.g. you could write an
>>> output v4l driver to receive the frames from guest1.
>> So, we now know that xen-zcopy was not meant to handle page flips,
>> but to implement new UAPI to let user-space create buffers either
>> from Guest2 grant references (so it can be exported to Guest1) or
>> other way round, e.g. create (from Guest1 grant references to export to
>> Guest 2). For that reason it adds 2 IOCTLs: create buffer from grefs
>> or produce grefs for the buffer given.
>> One additional IOCTL is to wait for the buffer to be released by
>> Guest2 user-space.
>> That being said, I don't quite see how v4l can be used here to implement
>> UAPI I need.
> Under the assumption that you can make xen-front to zerocopy for the
> kernel->hypervisor path, v4l could be made to work for the
> hypervisor->kernel side of the pipeline.
>
> But it sounds like we have a confusion already on why or why not xen-front
> can or cannot do zerocopy.
xen-front provides an array of grant references to Guest2 (backend).
It's up to backend what it does with those grant references
which at Guest2 side are not PRIME or dma-buf, but just a set of pages.
This is xen-zcopy which turns these pages into a PRIME. When this is done
backend can now tell DRM drivers to use the buffer in DRM terms.
>>>>> danvet, can you comment on this topic?
>>>>>
>>>>>>> 2. the page sharing mechanism - it uses Xen-grant-table.
>>>>>>>
>>>>>>> And to give you a quick summary of differences as far as I understand
>>>>>>> between two implementations (please correct me if I am wrong, Oleksandr.)
>>>>>>>
>>>>>>> 1. xen-zcopy is DRM specific - can import only DRM prime buffer
>>>>>>> while hyper_dmabuf can export any dmabuf regardless of originator
>>>>>> Well, this is true. And at the same time this is just a matter
>>>>>> of extending the API: xen-zcopy is a helper driver designed for
>>>>>> xen-front/back use-case, so this is why it only has DRM PRIME API
>>>>>>> 2. xen-zcopy doesn't seem to have dma-buf synchronization between two VMs
>>>>>>> while (as danvet called it as remote dmabuf api sharing) hyper_dmabuf sends
>>>>>>> out synchronization message to the exporting VM for synchronization.
>>>>>> This is true. Again, this is because of the use-cases it covers.
>>>>>> But having synchronization for a generic solution seems to be a good idea.
>>>>> Yeah, understood xen-zcopy works ok with your use case. But I am just curious
>>>>> if it is ok not to have any inter-domain synchronization in this sharing model.
>>>> The synchronization is done with displif protocol [1]
>>>>> The buffer being shared is technically dma-buf and originator needs to be able
>>>>> to keep track of it.
>>>> As I am working in DRM terms the tracking is done by the DRM core
>>>> for me for free. (This might be one of the reasons Daniel sees DRM
>>>> based implementation fit very good from code-reuse POV).
>>> Hm, not sure what tracking you refer to here all ... I got lost in all the
>>> replies while catching up.
>>>
>> I was just referring to accounting stuff already implemented in the DRM
>> core,
>> so I don't have to worry about doing the same for buffers to understand
>> when they are released etc.
>>>>>>> 3. 1-level references - when using grant-table for sharing pages, there will
>>>>>>> be same # of refs (each 8 byte)
>>>>>> To be precise, grant ref is 4 bytes
>>>>> You are right. Thanks for correction.;)
>>>>>
>>>>>>> as # of shared pages, which is passed to
>>>>>>> the userspace to be shared with importing VM in case of xen-zcopy.
>>>>>> The reason for that is that xen-zcopy is a helper driver, e.g.
>>>>>> the grant references come from the display backend [1], which implements
>>>>>> Xen display protocol [2]. So, effectively the backend extracts references
>>>>>> from frontend's requests and passes those to xen-zcopy as an array
>>>>>> of refs.
>>>>>>>     Compared
>>>>>>> to this, hyper_dmabuf does multiple level addressing to generate only one
>>>>>>> reference id that represents all shared pages.
>>>>>> In the protocol [2] only one reference to the gref directory is passed
>>>>>> between VMs
>>>>>> (and the gref directory is a single-linked list of shared pages containing
>>>>>> all
>>>>>> of the grefs of the buffer).
>>>>> ok, good to know. I will look into its implementation in more details but is
>>>>> this gref directory (chained grefs) something that can be used for any general
>>>>> memory sharing use case or is it jsut for xen-display (in current code base)?
>>>> Not to mislead you: one grant ref is passed via displif protocol,
>>>> but the page it's referencing contains the rest of the grant refs.
>>>>
>>>> As to if this can be used for any memory: yes. It is the same for
>>>> sndif and displif Xen protocols, but defined twice as strictly speaking
>>>> sndif and displif are two separate protocols.
>>>>
>>>> While reviewing your RFC v2 one of the comments I had [2] was that if we
>>>> can start from defining such a generic protocol for hyper-dmabuf.
>>>> It can be a header file, which not only has the description part
>>>> (which then become a part of Documentation/...rst file), but also defines
>>>> all the required constants for requests, responses, defines message formats,
>>>> state diagrams etc. all at one place. Of course this protocol must not be
>>>> Xen specific, but be OS/hypervisor agnostic.
>>>> Having that will trigger a new round of discussion, so we have it all
>>>> designed
>>>> and discussed before we start implementing.
>>>>
>>>> Besides the protocol we have to design UAPI part as well and make sure
>>>> the hyper-dmabuf is not only accessible from user-space, but there will be
>>>> number
>>>> of kernel-space users as well.
>>> Again, why do you want to create new uapi for this? Given the very strict
>>> requirements we have for new uapi (see above link), it's the toughest way
>>> to get any kind of support in.
>> I do understand that adding new UAPI is not good for many reasons.
>> But here I was meaning that current hyper-dmabuf design is
>> only user-space oriented, e.g. it provides number of IOCTLs to do all
>> the work. But I need a way to access the same from the kernel, so, for
>> example,
>> some other para-virtual driver can export/import dma-buf, not only
>> user-space.
> If you need an import-export helper library, just merge it. Do not attach
> any uapi to it, just the internal helpers.
>
> Much, much, much easier to land.
This can be done, but again, I will need some entity which
backend may use to convert xen-front's grant references into
a PRIME buffer, hence there is UAPI for that. In other words,
I'll need a thiner xen-zcopy which will implement the same UAPI
and use that library for Xen related stuff.

The confusion may also come from the fact that the backend is
a user-space application, not a kernel module (we have 2 modes
of its operation as of now: DRM master or Weston client), so
it needs a way to talk to the kernel.
>>> That's why I had essentially zero big questions for xen-front (except some
>>> implementation improvements, and stuff to make sure xen-front actually
>>> implements the real uapi semantics instead of its own), and why I'm asking
>>> much more questions on this stuff here.
>>>
>>>>>>> 4. inter VM messaging (hype_dmabuf only) - hyper_dmabuf has inter-vm msg
>>>>>>> communication defined for dmabuf synchronization and private data (meta
>>>>>>> info that Matt Roper mentioned) exchange.
>>>>>> This is true, xen-zcopy has no means for inter VM sync and meta-data,
>>>>>> simply because it doesn't have any code for inter VM exchange in it,
>>>>>> e.g. the inter VM protocol is handled by the backend [1].
>>>>>>> 5. driver-to-driver notification (hyper_dmabuf only) - importing VM gets
>>>>>>> notified when newdmabuf is exported from other VM - uevent can be optionally
>>>>>>> generated when this happens.
>>>>>>>
>>>>>>> 6. structure - hyper_dmabuf is targetting to provide a generic solution for
>>>>>>> inter-domain dmabuf sharing for most hypervisors, which is why it has two
>>>>>>> layers as mattrope mentioned, front-end that contains standard API and backend
>>>>>>> that is specific to hypervisor.
>>>>>> Again, xen-zcopy is decoupled from inter VM communication
>>>>>>>>> No idea, didn't look at it in detail.
>>>>>>>>>
>>>>>>>>> Looks pretty complex from a distant view.  Maybe because it tries to
>>>>>>>>> build a communication framework using dma-bufs instead of a simple
>>>>>>>>> dma-buf passing mechanism.
>>>>>>> we started with simple dma-buf sharing but realized there are many
>>>>>>> things we need to consider in real use-case, so we added communication
>>>>>>> , notification and dma-buf synchronization then re-structured it to
>>>>>>> front-end and back-end (this made things more compicated..) since Xen
>>>>>>> was not our only target. Also, we thought passing the reference for the
>>>>>>> buffer (hyper_dmabuf_id) is not secure so added uvent mechanism later.
>>>>>>>
>>>>>>>> Yes, I am looking at it now, trying to figure out the full story
>>>>>>>> and its implementation. BTW, Intel guys were about to share some
>>>>>>>> test application for hyper-dmabuf, maybe I have missed one.
>>>>>>>> It could probably better explain the use-cases and the complexity
>>>>>>>> they have in hyper-dmabuf.
>>>>>>> One example is actually in github. If you want take a look at it, please
>>>>>>> visit:
>>>>>>>
>>>>>>> https://github.com/downor/linux_hyper_dmabuf_test/tree/xen/simple_export
>>>>>> Thank you, I'll have a look
>>>>>>>>> Like xen-zcopy it seems to depend on the idea that the hypervisor
>>>>>>>>> manages all memory it is easy for guests to share pages with the help of
>>>>>>>>> the hypervisor.
>>>>>>>> So, for xen-zcopy we were not trying to make it generic,
>>>>>>>> it just solves display (dumb) zero-copying use-cases for Xen.
>>>>>>>> We implemented it as a DRM helper driver because we can't see any
>>>>>>>> other use-cases as of now.
>>>>>>>> For example, we also have Xen para-virtualized sound driver, but
>>>>>>>> its buffer memory usage is not comparable to what display wants
>>>>>>>> and it works somewhat differently (e.g. there is no "frame done"
>>>>>>>> event, so one can't tell when the sound buffer can be "flipped").
>>>>>>>> At the same time, we do not use virtio-gpu, so this could probably
>>>>>>>> be one more candidate for shared dma-bufs some day.
>>>>>>>>>      Which simply isn't the case on kvm.
>>>>>>>>>
>>>>>>>>> hyper-dmabuf and xen-zcopy could maybe share code, or hyper-dmabuf build
>>>>>>>>> on top of xen-zcopy.
>>>>>>>> Hm, I can imagine that: xen-zcopy could be a library code for hyper-dmabuf
>>>>>>>> in terms of implementing all that page sharing fun in multiple directions,
>>>>>>>> e.g. Host->Guest, Guest->Host, Guest<->Guest.
>>>>>>>> But I'll let Matt and Dongwon to comment on that.
>>>>>>> I think we can definitely collaborate. Especially, maybe we are using some
>>>>>>> outdated sharing mechanism/grant-table mechanism in our Xen backend (thanks
>>>>>>> for bringing that up Oleksandr). However, the question is once we collaborate
>>>>>>> somehow, can xen-zcopy's usecase use the standard API that hyper_dmabuf
>>>>>>> provides? I don't think we need different IOCTLs that do the same in the final
>>>>>>> solution.
>>>>>>>
>>>>>> If you think of xen-zcopy as a library (which implements Xen
>>>>>> grant references mangling) and DRM PRIME wrapper on top of that
>>>>>> library, we can probably define proper API for that library,
>>>>>> so both xen-zcopy and hyper-dmabuf can use it. What is more, I am
>>>>>> about to start upstreaming Xen para-virtualized sound device driver soon,
>>>>>> which also uses similar code and gref passing mechanism [3].
>>>>>> (Actually, I was about to upstream drm/xen-front, drm/xen-zcopy and
>>>>>> snd/xen-front and then propose a Xen helper library for sharing big buffers,
>>>>>> so common code of the above drivers can use the same code w/o code
>>>>>> duplication)
>>>>> I think it is possible to use your functions for memory sharing part in
>>>>> hyper_dmabuf's backend (this 'backend' means the layer that does page sharing
>>>>> and inter-vm communication with xen-specific way.), so why don't we work on
>>>>> "Xen helper library for sharing big buffers" first while we continue our
>>>>> discussion on the common API layer that can cover any dmabuf sharing cases.
>>>>>
>>>> Well, I would love we reuse the code that I have, but I also
>>>> understand that it was limited by my use-cases. So, I do not
>>>> insist we have to ;)
>>>> If we start designing and discussing hyper-dmabuf protocol we of course
>>>> can work on this helper library in parallel.
>>> Imo code reuse is overrated. Adding new uapi is what freaks me out here
>>> :-)
>>>
>>> If we end up with duplicated implementations, even in upstream, meh, not
>>> great, but also ok. New uapi, and in a similar way, new hypervisor api
>>> like the dma-buf forwarding that hyperdmabuf does is the kind of thing
>>> that will lock us in for 10+ years (if we make a mistake).
>>>
>>>>>> Thank you,
>>>>>> Oleksandr
>>>>>>
>>>>>> P.S. All, is it a good idea to move this out of udmabuf thread into a
>>>>>> dedicated one?
>>>>> Either way is fine with me.
>>>> So, if you can start designing the protocol we may have a dedicated mail
>>>> thread for that. I will try to help with the protocol as much as I can
>>> Please don't start with the protocol. Instead start with the concrete
>>> use-cases, and then figure out why exactly you need new uapi. Once we have
>>> that answered, we can start thinking about fleshing out the details.
>> On my side there are only 2 use-cases, Guest2 only:
>> 1. Create a PRIME (dma-buf) from grant references
>> 2. Create grant references from PRIME (dma-buf)
> So these grant references, are those userspace visible things?
Yes, the user-space backend receives those from xen-front via [1]

> I thought
> the grant references was just the kernel/hypervisor internal magic to make
> this all work?
So, I can map the grant references from user-space, but I won't
be able to turn those into a PRIME buffer. So, the only use of those
w/o xen-zcopy is to map grant refs and copy into real HW dumb on every 
page flip.
> -Daniel
[1] 
https://elixir.bootlin.com/linux/v4.17-rc1/source/include/xen/interface/io/displif.h#L484