[gst-devel] Text rendering

Sun Feb 20 09:54:25 CET 2005

On Mon, Feb 21, 2005 at 04:08:44AM +1100, Jan Schmidt wrote:
> On Sun, 2005-02-20 at 17:34 +0100, Gergely Nagy wrote:
> 
> >On Mon, 21 Feb 2005 01:40:48 +1100, Jan Schmidt <thaytan at noraisin.net> wrote:
> >> On Sun, 2005-02-20 at 15:09 +0100, Maciej Katafiasz wrote:
> >> 
> >> >Is it really going to be expensive if most of that image will be 100%
> >> >alpha anyway?
> >> 
> >> Yes, it's more expensive to write out and then drag in 2 entire frames
> >> worth of overlay image from main memory and iterate over it ~25 times
> >> per second if you don't have to.
> >> 
> >> There's a need for an imageoverlay to do this, but I think there's also
> >> a place for a text renderer that draws straight onto the image buffer.
> >
> >I'd like to believe there is no need to have two elements that do
> >almost the same thing, but rather have one, that can efficiently do
> >both tasks.
> 
> 
> I'm thinking that it's undesirable to have a single element that does
> both text/font rendering and 
> general image blending - an application doesn't want to load a
> pango/cairo dependency to render 
> text if it wants to blend png or DVD subtitle images over the video, for
> example. 
> 
> For that reason, it would be nicer to have a separate textoverlay as we
> currently have, that draws
> text directly onto an image buffer, or a textrender that generates
> output buffers in a format that
> makes blending efficient for a separate imageoverlay to handle.

Well, as I see things, and how I imagined it when starting the thread,
there would be a text renderer, that renders text into a buffer usable
by imagemixer. And it would be the job of imagemixer to blend that
onto the video buffer.

This way, applications that only want to blend png or DVD subtitle images,
only need the imagemixer. Those that need to render text, also need
the text renderer element too.

Note, that the text renderer element would only create a buffer that
is large enough to hold the text, not as large as the whole frame
it will be rendered onto. This way the extra overhead coming from
the fact that renderer and blender are separated is insignificant.

(Here, I have a 640x48 image generated by a simple text renderer,
blended over a v4lsrc at 30 fps, and it seems to work efficiently)

> >> A possibly-decent alternative would be to provide the rendered
> >> text-as-image overlay frames to the imageoverlay in a new frame format
> >> that isn't just a raw image frame, but has a header describing
> >> cropping information at the start of each frame.
> >
> >How about rendering the text only (then, the buffer will be eg,
> >174x48), setting the caps on the srcpad appropriately (so the mixer
> >notices the dimension change), and optionally telling the mixer in one
> >way or the other the new position at which blending should occur? (Eg,
> >by setting the mixers xpos/ypos properties, or somesuch)
> 
> 
> Setting the caps would work, but you don't want to specify the xpos/ypos
> as a property on the image blending element - if you do that, it becomes
> the application's responsibility to synchronise updating it with the
> processing of the stream. You really want to put the xpos/ypos in the
> stream itself.

Not neccessarily. A subtitlebin or similar could do that for you.

> You could put it in the caps on the pad along with the
> width and height: width, height, xpos, ypos. That involves (potentially)
> changing the caps on every frame, which may or may not be bad for
> performance, I'm not sure.

With 0.8, I think that means a significant performance drop (though, I
haven't tried; and it would be a nice thing if this would work fast
enough).

Or the "dparams handled as streams" stuff I heard on IRC might also
help here, later.

> We're still left with bad performance in the case where you have a
> subtitle to render containing 2 pieces of text in opposite corners of
> the frame. 

Why?

Sure, if the text frame is one big frame, of the size of the video,
that would suck, no doubt.

If the text in the lower right corner is one smaller buffer, and the
other in the upper left corner is another smaller one, there's no
significant performance loss.

> In that case, you still end up with a large mostly-empty image to blend
> onto the output, but it would be handled more efficiently by a
> textoverlay that could render directly onto the frame.

Only when the text frame is the same size as the video you blend it onto,
which is a mistake to begin with, in my opinion.

> Run-length compression, or at least some line-skipping semantics in the
> overlay image frame format handed to the videooverlay would alleviate
> this. In other words, I still think it's a good idea to hand buffers to
> the imageoverlay in a format other than raw uncompressed AYUV.

In my experience, rendering text in one element, into a buffer that is
just as large as it needs to be (ie, no empty lines and stuff), and
blending that onto the video is fast enough.

(In a previous, not gstreamer based software of mine, I have done
that, and could do as much as 30 fps, blending a full-width, 48
pixel height text image onto a 720x576 video; I'm pretty sure
the blender could have done more, I just didn't test.

I did something like that with gstreamer too, but did not
measure it yet)