[gst-devel] Text rendering

Sun Feb 20 13:59:28 CET 2005

On Sun, 20 Feb 2005 22:09:15 +0100, Maciej Katafiasz <ml at mathrick.org> wrote:
> Dnia 20-02-2005, nie o godzinie 18:17 +0100, Gergely Nagy napisał:
> 
> > > Ugh. Now, totally trivial (not at all, but it gives very good idea about
> > > what's needed) and really basic (ie, if we don't have it, we have can as
> > > well have no formatted subtitles at all) example: karaoke text. In the
> > > simplest case it looks like this:
> > >
> > > Some more or less interesting lyrics here
> > > ----------------------^
> > > [Already sung colour] | [Yet to be sung colour]
> >
> > For karaoke text, one renders the text char-by-char. Then we know the
> > size of all chars. Then, put that together to form the complete text
> > rendering. Somewhere else, we have two images, with the exact same
> > dimensions as the lyrics text. One with the already sung colour, one
> > with the yet to be sung colour. Then, merge the yet-to-be-sung image
> > onto the text, so we have our text rendered in yet-to-be-sung colour.
> > We'll use this buffer in each iteration from now on.
> >
> > Then, we take the already-sung image, and merge it onto the text
> > buffer (after marking that buffer read-only, so imagemixer will make a
> > copy) at such a position, that the colouring will end at the right
> > position (ie, we start at a very negative xpos, and end at 0). Since
> > we know the size of each char (we rendered the lyrics char-by-char
> > because we need this info), we know where the song currently is, we
> > can calculate the merge position.
> >
> > This way, one does not need an element to parse an image description
> > over and over again, nor one does need an element that understand a
> > complex rendering protocol.
> 
> Ugh. Again, this element can only render karaoke that's horizontal text,
> left to right.

Why? You can make the already-sung-color thing flow in from any direction.
I can prepare a demo (in a few days, as I only have time for this in
my spare time, which is quite limited :() to show what I mean.

This still doesn't work for twisting, curling, flashing, bouncing stuff, though.

> > > Is it really going to be expensive if most of that image will be 100%
> > > alpha anyway?
> >
> > Yes, unless you do some RLE, in which case you're overdoing stuff,
> > methinks. If the image you generate has empty spaces, you could just
> > skip generating those, and tell the mixer where to merge the image,
> > instead of positioning it yourself. (This way, the user can have
> > subtitles on the top of the video if so he wants, and the renderer
> > does not need to know about it at all).
> 
> Aaah, you mean generating empty space is expensive. I get it now. Oh
> well, if it really poses a problem, we can use what Jan proposed,
> regions similar to how X represents damage areas.

Generating them is not too expensive, blending a large image is. An
image with lots of empty lines is large.

> > > Honestly, I thought a little about a situation when it'd be "renderbin"
> > > rather than single element, with textrenderer as I outlined above, and
> > > additional video effects elements like ones you'd like to see. There
> > > would be something like application/x-gst-subtitles protocol parsing
> > > element which would read entire input stream, and then dispatch relevant
> > > bits of it to appropriate elements inside bin. This is a bit dodgy, and
> > > will no doubt require heavy thinking to get it sanely, but might be
> > > doable, who knows.
> >
> > I'd be much more happy with renderbin, than with renderelement.. On
> > the other hand, I'm quite clueless when it comes to multimedia, so...
> > dunno.
> >
> > After reading your mail, it seems to me, that many things mentioned in
> > this discussion can be done in various ways, and they all have their
> > pros and cons. Eg, the very simple text renderer I have is 12k lines
> > (with comments and everything), and if I strip out some stuff that is
> > obsolete, I can get it down to 9-10k I guess. With some clever
> > programming tricks, it can be used in many cases discussed here. On
> > the other hand, it really is not fit for some more fancy stuff like
> > the anime fan-subs.
> >
> > I had a few things in my mind against a generic cairo renderer, but..
> > most of that can be argued, and I don't know enough to explain them
> > anyway. I guess I'll see what comes out of this discussion, and see if
> > I can use the result :] (if I can't, then either the thing can be
> > fixed so I will be able to use it, or I can continue using my
> > pangotextsrc :)
> 
> Cairo is nice, because we get support for bascially everything we want
> to do in one place, and most importantly, we can combine those ops.
> Which saves us huge, klunky pipelines as described above. Oh, did I
> mention SSA/ASS also includes rendering arbitrary shapes? :) Not that
> anyone supports or uses that, but it does.

Sounds nice! Rendering arbitrary shapes is something I might even use
(think something like that "bar" you can see on, eg, MTV, on which
they print the song title, author, and so on).

I think I'm persuaded that my original idea was inappropriate for many jobs..

So, the only thing that remains, and on which we don't seem to agree,
is the way blending the rendered stuff onto a video frame. I'd prefer
using imagemixer, that leads to the least amount of code duplication
(not to mention that it can be pretty easily extended to be able to
mix images in various interesting formats, so it is not limited to
RGBA or AYUV; one of the things I want to do with imagemixer is to be
able to push I420+alpha buffers to it, and get I420 as output. That
way I won't have to do any colorspace conversions in my application at
all).

On the other hand, it seems to me, there is no easy way in 0.8 to pass
bounding box paramaters from the renderer to the mixer. I hope this
will change in 0.9. So, for 0.8, it might be better to have a
cairooverlay element. (Or a cairorenderer, that outputs something
representing a cairo canvas, embedded in a GstBuffer, and a
cairooverlay that takes it, and overlays it onto a video frame. This
latter would make it easier to port the thing later to the
param-passing way :)

Anyway, this is something I have the least clue about, so feel free to
beat me if I say something stupid :)