[gst-devel] Text rendering

Sun Feb 20 06:10:10 CET 2005

Dnia 20-02-2005, nie o godzinie 13:26 +0100, Gergely Nagy napisał:
> > > Anyway, for fancy effects (think not only in colors, but textures too),
> > > the approach outlined here will work fine, I guess.
> > 
> > In a word, no. It's not gonna work for subtitles at all (even simple
> > colouring of different lines in different colours will be difficult, and
> > individually coloured letters will be downright impossible).
> 
> For this, I imagined that the text description would include color markup
> and such. The text renderer would ignore that, and there would be another
> element, which would set up the coloring layer.

Ugh. Now, totally trivial (not at all, but it gives very good idea about
what's needed) and really basic (ie, if we don't have it, we have can as
well have no formatted subtitles at all) example: karaoke text. In the
simplest case it looks like this:

Some more or less interesting lyrics here
----------------------^
[Already sung colour] | [Yet to be sung colour]

^ represents "cursor" -- point where already sung and yet-to-be-sung
colours meet. Note it's not between chars, but in the middle of char --
if there's "I looooooooooooooove you" sung, you'll end with "o" in
"love" slowly shifting left-to-right from one colour into another.
That's basic case, but already fairly complicated with your design. And
now for a bit of real life: the characters will jump, twirl, shrink,
enlarge, flash and pounce as cursor passes by them. I'm not making this
up, that's just a small sample of effects you'll see in any random anime
fansub. I can't imagine any way to do that with text renderer "ignoring"
colour markup.

> Something like:
> 
>                                     +---------------+
> +-------------+ application/        | Text renderer |
> | Sub. parser |------------------+--+---------------+---+-------+
> +-------------+ x-gst-subtitles  |                      | Image |
>                                  |                      | mixer |
>                                  +--+---------------+---+---+---+
>                                     | Color-layer   |       |
>                                     | renderer      |       |
>                                     +---------------+       |
>                                                            /
>                                                           /
> +--------------+                       +-----------------+
> | Video source |-----------------------+ Image mixer     |
> +--------------+                       +-------+---------+
>                                                |
>                                        +-----------------+
>                                        | Video out       |
>                                        +-----------------+
> 
> This means a bit moe elements, but these elements would be simple. I like
> simple elements, that do only one tiny little thing.

But you lose info. You can't make "simple element" that deals with
colouring of text if it doesn't know exactly which part need to be
coloured, ie. doesn't have detailed layout knowledge. And you can't give
hints about layout to it if you don't make text renderer parse and
understand colour markup. So, you end with two elements which are no
longer simple, and essentialy do the same thing twice, only with
different results which get composed in the end. Not really a gain, I'd
say.

> However, since calculating the size of fonts is not easy, and that info
> is required for properly creating a layer of only color, text-renderer
> might also emit something like a `bounding-box markup'..
> 
> On the other hand, in the end, we end up writing a rendering system, built
> upon small components. Doing all this in one element with cairo from the
> start might be far better to start with.

Excatly. And it's not going to be easy, see below.

> > That's because whilst presented model is rather flexible when it works,
> > it's also extremely static (you'd need to replug pipeline *each* time you
> > wanted to have differently coloured line added, yuck!), and makes
> > complex things even more complex, losing lots of info in the process.
> 
> Don't think so... just fiddle some properties of the coloring element,
> and you're fine.

Umm, no. What I mean by losing info is "you get text + some formatting
(size, rotation, position), and now you have to transform formatting
info from *stream* into *pipeline*". Because coloured text needs
additional element, you need to replug when coloured text is introduced
for first time, etc. It's going to be *hard*, and (IMHO) inherently
limited. You can't really express (very dynamic) information from text
stream by (static) pipeline. You can replug pipeline, but it doesn't
make it dynamic, only static in discrete time spans.

> > What I want to see is *one* (in practice that's all that should be
> > needed, but we can make it subclassable for really special needs)
> > element supporting range of basic, but rich, and most importantly,
> > combinable ops from which we can build effects. IOW, we want (cairo)
> > canvas, and a protocol for manipulating its objects, which various
> > parsers and other elements that want to generate text will use.
> > 
> > Mandatory ASCII-art:
> > 
> > +----------+                 +-----------+            +-----------+
> > |          | application/    |   Text    |  video/    |           |
> > | Subtitle +-----------------+ Renderer  +------------+   Image   |
> > |  Parser  | x-gst-subtitles | (cairo    |  x-raw-rgb |   mixer   |
> > |          |                 | overlay)  |  (ARGB)    |           |
> > +----------+                 +-----------+            +-----------+
> >                                                           /
> >                                                          /
> >                                                         /
> >                                                        /
> > +----------+                                          /
> > |          |                                         /
> > |  Video   |     video/x-raw-rgb                    /
> > | renderer +---------------------------------------/
> > |          |
> > +----------+
> > 
> > 
> > It outputs into imagemixer instead of blitting directly to video because
> > Gergely wants to have output in non-blitted way to have fun with his toy
> > elements afterwards :). This way we can keep most of his proposal, that
> > is apply separate *video* effects, whilst having robust *text*
> > rendering.
> 
> Sounds good to me so far.
> 
> > Now, application/x-gst-subtitles is a protocol that would support:
> > 
> > - creating objects with unique ID
> > - manipulating (move, resize, rotate, colourise, maybe arbitrary
> > transformation) objects with given ID
> > - rendering (it should be operation separate from creation) objects
> > - destroying objects
> 
> Now, this is something I don't completely agree with. This sounds like
> subtitle rendering would be performed onto a canvas that has the size
> of the video, while the subtitles themselves might only be a small
> fraction of the whole thing. Now, blending a 720x576 image onto another
> is much more costy than blending a 720x48 image onto a 720x576 at
> position (0,650) (for example).

Is it really going to be expensive if most of that image will be 100%
alpha anyway?

> Some moving might have a place in the x-gst-subtitles protocol, but..
> support for scrolling should not be there, imho. What has its place
> there in my opinion, is line alignment or the like..
> 
> Hrm.. I think I'll think a bit more about this, and send another
> reply again, later today. I hope to be able to have some nice
> pipeline ideas by then, to illustrate what I have in mind.

Honestly, I thought a little about a situation when it'd be "renderbin"
rather than single element, with textrenderer as I outlined above, and
additional video effects elements like ones you'd like to see. There
would be something like application/x-gst-subtitles protocol parsing
element which would read entire input stream, and then dispatch relevant
bits of it to appropriate elements inside bin. This is a bit dodgy, and
will no doubt require heavy thinking to get it sanely, but might be
doable, who knows.

Cheers,
Maciej

-- 
Maciej Katafiasz <ml at mathrick.org>