[gst-devel] Using cairo/pixman for raw video in GStreamer

Fri Sep 11 19:02:48 CEST 2009

Hi,

This is an idea that's been brewing in my head for a bit. After
thinking about it for a while and poking some people on IRC, I'm
pretty convinced it's the best way forward.

Here's a list of problems I'd like to see solved:

1) Correctly identify video in the GStreamer elements (stride, width,
height, size of image and components)
In the short while I recently hacked on plugins, I found bugs in lots
of places, from common to obscure formats. And those were in pretty
common elements (theoraenc/dec, videotestsrc). Using the APIs in
gstvideo pretty much solves this problem for the current set of
plugins.

2) Allow drawing onto different video formats
There is actually multiple issues here: For a start, elements that
draw to various YUV formats often get it wrong - mostly in corner
cases. Others take shortcuts that degrade the quality of the video
(like videotestsrc not computing the average for U and V pixels for
subsampled planes).
Examples of elements doing drawing operations start with elements like
videocrop, videobox that resize the input or videotestsrc that draws
rectangles. Next step there's videomixer and textoverlay that compose
various input streams. And then there's various effect elements like
smpte or effectv or even videoscale at the top end. And almost all of
these elements support only a very limited set of colorspaces - I420
and AYUV mostly.
(Also, I always dreamed of doing an mplayer gstreamer filter that
responds to keypresses and displays the volume/brightness etc UI on
top of the video. That's really hard to do currently.)

3) Allow better interaction between applications consuming video and GStreamer
This is mostly related to web browsers, but applies to Flash, Clutter,
games and probably lots of other things, too: They all want to get
access to the video data and do stuff with it. Currently this often
involves a colorspace conversion to RGB and then stuffing that into a
Cairo surface. It would be much nicer if Cairo and pixman supported
YUV so the colorpsace conversion could be omitted when the hardware
accepts it.
The same goes in the other direction: I'd like to capture the screen
as YUV, not as RGB, if I record it to Theora video.

4) Allow hw-acceleration in the video pipeline
Decoding a H264 stream in hardware, rendering sutitles on top of it,
scaling it to fit and displaying as fullscreen video on my computer
can in theory all be done in hardware. Unfortunately, GStreamer
currently lacks infrastructure for this, so all this stuff ends up
being done in software.

5) Figuring out the porper format to use is an art
So where do you put the conversion element? Do you even have to put
one? Newcomers trip over these problems a lot and I still hate having
to edit gst-launch lines because I forgot some converter element
somewhere and now negotiation fails. I'd like this to happen
automatically.
Of course, it doesn't mean unnecessary colorspace conversions should
happen, and I also should be able to force a certain format if I want
to (important for testing).

These are the steps I'd like to propose as a solution:

1) Add extensive YUV support to pixman
The goal is to add an infrastructure so one can support at least the
formats supported by ffmpegcolorspace today. In fact the ffmpeg
infrastructure fits pretty well to pixman, but I'm not sure if a
straight port is acceptable license-wise.

2) Add support to Cairo to create surfaces from any pixman image
I'm not sure how hard this would be, as it basically circumvents
cairo_format_t - might be possible to hook it into image surfaces or
might be better to use a different surface backend. But it'd just add
a single function like cairo_pixman_surface_create (pixman_image_t
*image);

3) Add a new caps to gstreamer: video/cairo
I'm not sure yet about the specific properties required, but certainly
framerate, width and height are required. Probably an optional
pixman-format is required, too. Buffers passed in this format only
contain a reference to a cairo surface in the buffer's data.

4) Port elements to use this cairo API
Either add new elements (cairovideotestsrc, cairocolorspace) or add
support for the old ones. While doing this, refine and improve cairo
or pixman, so the elements can be implemented as nicely as possible. A
lot of code inside GStreamer should go away

5) Finalize APIs for pixman, cairo and GStreamer in unison
After enough code got ported (not sure if those should be separate
branches or if it should be part of experimental releases), we sit
together and finalize the API. At this point GStremaer elements switch
to using video/cairo as the default data passing format.

6) For next major GStreamer release, remove video/x-raw-*
The old formats are not needed anymore, they can be removed. All
elements are ported to the new API.

I think these steps would solve most of the problems I outlined above.

Of course some questions have come up about this that Id like to
answer before somebody has to ask this question in here:

1) "This is never gonna be fast enough"
I don't see why. Most of the operations people care about are just
memcpys and pixman is very good at detecting them and making them
fast. In fact, pixman has a huge infrastructure dedicated to speeding
up things that GStreamer cannot match. And no, the current scarce
usage of liboil doesn't count. Currently in a lot of cases unnecessary
colorspace conversions cost a lot of performance and these will go
away if every element supports every format.
In short: I wouldn't have proposed this if I'd think it'd make stuff slower.

2) "I will have less control over what happens"
No you won't. You'll be able to use the same formats as today and
access their data just like today. You just use pixman functions
instead of gst_video_* functions. I don't intend to move control away
from developers. The goal is to make life simpler for developers, not
harder.

3) "Adding new features to GStreamer will be a lot harder"
This is only halfway true. You will still be able to write elements
like you do today by accessing the raw data of the surface. Of course,
if you want to add a new YUV format, it will require support in
pixman, and this requires more work (or even depending on unstable
versions of pixman). On the other hand, once pixman supports that
element, all other GStreamer elements will support it automatically
and you can start rendering subtitles onto it. I also do not believe
that adding more formats is somehow a common thing that happens very
often, so it can easily wait until the next pixman or cairo release.
But yes, depending on other libraries reduces your options.

4) "Cairo/GStreamer developers will not like that"
In fact, I talked to both of the maintainers and the response in both
cases was pretty positive, but skeptical about the feasability of such
a project, mostly fueled by preconceptions about what Cairo or
GStreamer is and how it works. I consider myself part of both the
Cairo and GStreamer comunities and know the code in quite some detail
and I do think it's a very good fit.

So, opinions, questions, encouragement or anything else?

Benjamin