Sharing Framebuffers between Client / Server
Thomas Hellstrom
thomas at shipmail.org
Mon Jan 20 23:37:21 PST 2014
On 01/20/2014 04:21 PM, Rian Quinn wrote:
> Yeah we looked into GDM. We already link into Mesa, but I was also
> concerned about having to use GL to render into the buffers, as the
> format of the buffer is (correct me if I am wrong) specific to the
> graphics card (i.e. it's not a simple ARGB format which I need).
I think for sharing buffers that can be rendered into by the GPU, GBM +
EGL is the API of choice. I'm not aware of any docs, but Jesse Barnes
has made a small writeup:
http://virtuousgeek.org/blog/index.php/jbarnes
GBM has been designed to do just this, and to hide the driver
differences (GEM, TTM objects) from the user.
It sounds very much that you're after a model identical to Ubuntu's Mir
where the servers creates the shared buffers, (or Wayland's DRM
compositor for that matter if you want the clients to create the
framebuffers and then share them with the server). (At least my
understanding :). Perhaps a good starting point would be to look at one
of (or both) these compositors low-level DRM code?
>
> Could you point me to some docs, or headers for DRM prime. I think
> that using simple “dumb” KMS buffers should work fine.
>
>
> In my use case, I actually have to have a memcpy because there will
> only be one scanout buffer managed by the server. Each client needs to
> directly render (ARGB) to a framebuffer, and then the server will
> memcpy the contents it wants, to the scanout buffer. In that past, we
> used a CPU memcpy, but we would like to use a DMA transfer going
> forward. Our plan was to use Mesa/GL to do a BLT_SRC_COPY. Will the
> “dumb” buffer support this?
No. You can never use accelerated rendering to or from dumb buffers.
Also reading from dumb buffers with the CPU may be painfully slow on
some architectures where they are put in VRAM or write-combined memory.
>
> I guess this brings up a performance question too. Is it better to
> render into system memory, and then DMA from that to video memory, or
> is it better to render into video memory, and DMA between video memory?
It's highly GPU-dependant. Many older GPUs do not support rendering into
system memory.
>
>
> - Rian
>
/Thomas
>
> Sent with Unibox <http://www.uniboxapp.com/t/sig>
>
>> On Jan 20, 2014, at 8:41 AM, Thomas Hellstrom <thomas at shipmail.org>
>> wrote:
>>
>> On 01/20/2014 02:10 PM, Rob Clark wrote:
>>
>> On Fri, Jan 17, 2014 at 6:43 AM, Rian Quinn <rianquinn at gmail.com>
>> wrote:
>>
>> I am working on a client/server program, where the server
>> creates (and has
>> access to a framebuffer), and then needs to share this
>> framebuffer with a
>> client program so that this client program can draw into the
>> framebuffer
>> directly (i.e. no memcpy).
>>
>> I am trying to figureout what the “cleanest” way to do this
>> is, such that I
>> can support Intel’s proprietary driver, the open source AMD
>> and NVidia
>> drivers, and the VMWare driver (I have no need for the
>> proprietary
>> ADM/NVidia drivers right now). From what I can tell, GEM is
>> one way to do
>> this. The problem is VMWare doesn’t support GEM.
>>
>> I tried (knowing it would not work), using KMS to create the
>> framebuffer,
>> and then sending the information needed to mmap to the
>> client. This of
>> course failed because the framebuffer is marked non-sharable
>> in the kernel.
>>
>> Dmabuf (or just plain old egl/glx which uses dri2 under the hood)
>> would probably be what I suggest *except* you mention mmap. If you
>> are doing software rendering, I guess you probably just want to suck
>> it up and do XShmPutImage.
>>
>> From what I understand, any sort of mmap access to vmwgfx
>> buffers is
>>
>> tricky, because they end up being backed by normal GL textures on the
>> host OS side (IIUC). So the single copy upload path in XShmPutImage
>> might be close to the ideal path for sw rendered content.
>>
>> To be clear, I am fine having to manually write ioctls for
>> each driver, if
>> thats what it takes. But at this point, I am at a loss on the
>> best method to
>> share scannot buffers (or at least in a way that doesn’t make
>> someone cringe
>> when they see my code).
>>
>> Some sort of prepare/finish access ioctls for dmabuf to bracket mmap
>> access are, I think, what vmwgfx is missing in order to implement
>> dmabuf mmap. But no one so far has needed them badly enough to come
>> up with something and send patches.
>>
>> IIUC, for vmwgfx there would still be a copy back to original texture
>> on the host on finish-access, so it might not amount to anything much
>> different from XShmPutImage. Probably better to ask some vmwgfx folks
>> to clarify, since the virtual driver has some unique constraints
>> which
>> I may not be adequately representing.
>>
>> BR,
>> -R
>>
>>
>> Rian, for sharing accelerated buffers, They'd best be created with
>> Mesa's GBM, and shared using DRM prime. Those interfaces are generic and
>> AFAICT, Ubuntu's Mir works exactly this way. The problem is that the
>> client would need to link to mesa, and could use GL / GLES to transfer
>> software contents to the buffer.
>>
>> For pure software contents, the server would share a generic shared
>> memory buffer, together with a damage protocol, and composite / copy
>> onto the framebuffer in the server.
>>
>> In principle, as Rob says, the dma-buf shared using DRM prime has an
>> mmap() method, but none of the "big" drivers Intel, Nouveau and Radeon
>> implements it, and for vmwgfx an implementation would be extremely
>> inefficient. Also other drivers may have issues with write-combining and
>> tiling of the mmap'ed framebuffer.
>>
>> If both the server and client would be 100% software one could create
>> and share "dumb" kms buffers using DRM prime. If there's something in
>> the vmwgfx driver that blocks sharing in this way, we could ease that
>> restriction. But these buffers could not be rendered into.
>>
>> GEM is, BTW, purely driver-private.
>>
>> As you might see, the big restriction here is that there is no simple
>> generic way to mmap() accelerated shared buffers from a lean client.
>> This is intentional. For vmwgfx it's because of coherency issues that
>> would make such an implementation inefficient. For other drivers I can
>> imagine there are tiling- and caching issues.
>>
>> /Thomas
>>
>>
>> Thanks,
>> - Rian
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>>
>>
More information about the dri-devel
mailing list