[gst-devel] Text rendering

Sun Feb 20 13:10:09 CET 2005

Dnia 20-02-2005, nie o godzinie 18:17 +0100, Gergely Nagy napisał:

> > Ugh. Now, totally trivial (not at all, but it gives very good idea about
> > what's needed) and really basic (ie, if we don't have it, we have can as
> > well have no formatted subtitles at all) example: karaoke text. In the
> > simplest case it looks like this:
> > 
> > Some more or less interesting lyrics here
> > ----------------------^
> > [Already sung colour] | [Yet to be sung colour]
> 
> For karaoke text, one renders the text char-by-char. Then we know the
> size of all chars. Then, put that together to form the complete text
> rendering. Somewhere else, we have two images, with the exact same
> dimensions as the lyrics text. One with the already sung colour, one
> with the yet to be sung colour. Then, merge the yet-to-be-sung image
> onto the text, so we have our text rendered in yet-to-be-sung colour.
> We'll use this buffer in each iteration from now on.
> 
> Then, we take the already-sung image, and merge it onto the text
> buffer (after marking that buffer read-only, so imagemixer will make a
> copy) at such a position, that the colouring will end at the right
> position (ie, we start at a very negative xpos, and end at 0). Since
> we know the size of each char (we rendered the lyrics char-by-char
> because we need this info), we know where the song currently is, we
> can calculate the merge position.
> 
> This way, one does not need an element to parse an image description
> over and over again, nor one does need an element that understand a
> complex rendering protocol.

Ugh. Again, this element can only render karaoke that's horizontal text,
left to right.

> > ^ represents "cursor" -- point where already sung and yet-to-be-sung
> > colours meet. Note it's not between chars, but in the middle of char --
> > if there's "I looooooooooooooove you" sung, you'll end with "o" in
> > "love" slowly shifting left-to-right from one colour into another.
> > That's basic case, but already fairly complicated with your design.
> 
> If you know the width of each char, and have a description of when a
> part of the lyrics starts, and when it ends, it's pretty easy to code
> a karaoke app (or element)

Umm, as I said, there is no simple "width of char", they will jump
around like crazy, changing all the time.

> > And now for a bit of real life: the characters will jump, twirl, shrink,
> > enlarge, flash and pounce as cursor passes by them. I'm not making this
> > up, that's just a small sample of effects you'll see in any random anime
> > fansub. I can't imagine any way to do that with text renderer "ignoring"
> > colour markup.
> 
> You render by char.. Though, you're right when you say that is doomed
> to be dog slow.

Err, I render by char and ...?  What happens then? :)

> > Umm, no. What I mean by losing info is "you get text + some formatting
> > (size, rotation, position), and now you have to transform formatting
> > info from *stream* into *pipeline*". Because coloured text needs
> > additional element, you need to replug when coloured text is introduced
> > for first time, etc. It's going to be *hard*, and (IMHO) inherently
> > limited. You can't really express (very dynamic) information from text
> > stream by (static) pipeline. You can replug pipeline, but it doesn't
> > make it dynamic, only static in discrete time spans.
> 
> Ah! Now I see your point! Thanks!

Good. That simplifies things :)

> > Is it really going to be expensive if most of that image will be 100%
> > alpha anyway?
> 
> Yes, unless you do some RLE, in which case you're overdoing stuff,
> methinks. If the image you generate has empty spaces, you could just
> skip generating those, and tell the mixer where to merge the image,
> instead of positioning it yourself. (This way, the user can have
> subtitles on the top of the video if so he wants, and the renderer
> does not need to know about it at all).

Aaah, you mean generating empty space is expensive. I get it now. Oh
well, if it really poses a problem, we can use what Jan proposed,
regions similar to how X represents damage areas.

> > Honestly, I thought a little about a situation when it'd be "renderbin"
> > rather than single element, with textrenderer as I outlined above, and
> > additional video effects elements like ones you'd like to see. There
> > would be something like application/x-gst-subtitles protocol parsing
> > element which would read entire input stream, and then dispatch relevant
> > bits of it to appropriate elements inside bin. This is a bit dodgy, and
> > will no doubt require heavy thinking to get it sanely, but might be
> > doable, who knows.
> 
> I'd be much more happy with renderbin, than with renderelement.. On
> the other hand, I'm quite clueless when it comes to multimedia, so...
> dunno.
> 
> After reading your mail, it seems to me, that many things mentioned in
> this discussion can be done in various ways, and they all have their
> pros and cons. Eg, the very simple text renderer I have is 12k lines
> (with comments and everything), and if I strip out some stuff that is
> obsolete, I can get it down to 9-10k I guess. With some clever
> programming tricks, it can be used in many cases discussed here. On
> the other hand, it really is not fit for some more fancy stuff like
> the anime fan-subs.
> 
> I had a few things in my mind against a generic cairo renderer, but..
> most of that can be argued, and I don't know enough to explain them
> anyway. I guess I'll see what comes out of this discussion, and see if
> I can use the result :] (if I can't, then either the thing can be
> fixed so I will be able to use it, or I can continue using my
> pangotextsrc :)

Cairo is nice, because we get support for bascially everything we want
to do in one place, and most importantly, we can combine those ops.
Which saves us huge, klunky pipelines as described above. Oh, did I
mention SSA/ASS also includes rendering arbitrary shapes? :) Not that
anyone supports or uses that, but it does.

Cheers,
Maciej

-- 
Maciej Katafiasz <ml at mathrick.org>