vaaapidecode, shmsink, and allocator

kmliu liu_kim at
Thu Dec 12 18:34:46 UTC 2019

Thanks for suggesting pipewire. I was not aware of that.

In my use case, I ultimately need to send raw video from one virtual machine
(where it is decoded) to multiple virtual machines (where it is consumed).
We are building an inter-VM shared memory mechanism for that. We want to use
shmsink/shmsrc as a starting point (with code changes to use the inter-VM
shared memory) for the gstreamer pipeline in the sender VM and the receiver
VM. To test out the performance, we are also using shmsink/shmsrc within a
single VM (running both the sender and receiver pipeline) to see the impact
on CPU load and memory throughput. 

Because we ultimately need this to work across VMs, a DMAbuf-based solution
probably won't work unless we can somehow emulate DMAbuf across VMs? Not
sure if this even makes any sense.

Going back to the pipeline I am testing, I still don't quite understand why
vaapih264dec can't use shmsink's allocator. Also, different versions of the
code seem to work differently. In 1.14.4, the negotiated format between the
two elements is actually video/x-raw(memory:VASurface). In 1.16.1, it is
just video/x-raw. In both cases, shmsink's render function needs to map and
copy the GstMemory because it's not allocated by its own allocator. In
1.14.4, does mapping the GstMemory that is based on VASurface involve
downloading the raw video from the GPU to somewhere in system memory and
then the render function needs to copy it again from that location in system
memory to the shared memory region? In 1.16.1, since the negotiated format
is video/x-raw, vaapih264dec must have already downloaded the raw video from
the GPU to system memory before passing it to shmsink. So when shmsink's
render function copies from there to its shared memory region, it would be
an extra copy that we hope we can live without.

Nicolas Dufresne-5 wrote
> Le jeudi 12 décembre 2019 à 11:37 +0100, Víctor Jáquez a écrit :
>> On Wed, 11 Dec 2019 at 20:30, kmliu wrote:
>> > I'm running the following pipeline that decodes H.264 video from a file
>> and
>> > then sends the raw video to a shmsink. I have another pipeline that
>> picks up
>> > the raw video from a corresponding shmsrc.
>> > 
>> > gst-launch-1.0 --gst-debug=vaapidecode:6,shmsink:6 filesrc
>> > location=bbb_sunflower_1080p_30fps_normal.mp4 ! qtdemux ! vaapih264dec
>> !
>> > shmsink wait-for-connection=false socket-path=/tmp/tmpsock sync=true
>> > 
>> > It works but I see more memory reads and writes than I expected. BTW, I
>> run
>> > 16 instances of this pipeline to get GBs per second of reads and
>> writes. So
>> > I started looking into the ALLOCATION query between vaapih264dec and
>> > shmsink. The shmsink seems to propose a custom allocator that allocates
>> from
>> > its shared memory buffer. But in
>> gst_vaapi_plugin_base_decide_allocation(),
>> > it simply keeps a reference to shmsink's allocator in
>> other_srcpad_allocator
>> > and proceeds to use its own allocator instead. I think because of this,
>> > shmsink needs to do a memcpy as indicated by the following debug
>> message.
>> > BTW, this message is kind of misleading. I think it means the memory in
>> the
>> > buffer was not allocated by shmsink's own allocator and it's allocated
>> by
>> > vaapivideoallocator0 instead.
>> > 
>> > shmsink gstshmsink.c:714:gst_shm_sink_render:
> <shmsink0>
>  Memory in buffer
>> > 0x7f2aa8052480 was not allocated by 
> <vaapivideoallocator0>
> , will memcpy
>> > 
>> > I don't understand vaapidecode enough to tell why it doesn't just agree
>> to
>> > use shmsink's allocator. The only place other_srcpad_allocator is used
>> is in
>> > gst_vaapidecode_push_decoded_frame(). It doesn't look like that
>> particular
>> > codepath is taken though.
>> I haven't tested shmsink, but vaapi needs to use its own source pad
>> allocator,
>> because it produces VASurfaces. Those surface have to be "downloaded" to
>> another
>> memory area, and that's a memcpy for many use cases.
> shmsrc/sink is not zero-copy. It creates one segment of shared memory
> and everything not writing to that segment directly (that is the case
> for VAAPI, it simply can't), will have to be copied into it.
> If I had a project with this multi-process requirement, I would use
> pipewire daemon. You can get VAAPI to export dmabuf, and pipewire is
> able to stream dmabuf and memfd across processes without copying.
>> vmjl
>> > 
>> > 
>> > --
>> > Sent from:
>> > _______________________________________________
>> > gstreamer-devel mailing list
>> > 

> gstreamer-devel at .freedesktop

>> >
>> > 
>> _______________________________________________
>> gstreamer-devel mailing list

> gstreamer-devel at .freedesktop

> _______________________________________________
> gstreamer-devel mailing list

> gstreamer-devel at .freedesktop

> signature.asc (201 bytes)
> <>

Sent from:

More information about the gstreamer-devel mailing list