Pipeline Optimization

Wed Apr 20 13:11:33 UTC 2022

Le mercredi 20 avril 2022 à 03:38 -0500, Matt Clark a écrit :
> On Tue, Apr 19, 2022 at 9:45 AM Nicolas Dufresne <nicolas at ndufresne.ca> wrote:
> > Le mardi 19 avril 2022 à 01:20 -0500, Matt Clark a écrit :
> > > Thanks for the tip! I was wondering about those, but like I said, bit of a
> > > newbie. Here is my new graph:
> > > debug_session(sans extra converts).png
> > > Seems to be working still which is always a good thing! 
> > > I'm all for more CPU optimizations (it's still spinning up almost 40
> > > threads,
> > 
> > I count 9 streaming threads (thread induced by the pipeline design). 
> > 
> > - 3 threads for the 3 appsrc
> > - 3 leaky queues
> > - 1 compositor
> > - 1 queue before hlssink (miss-placed btw, should be right after the tee)
> > - 1 queue inside hlssink
> > 
> > GIO will of course add couple of more threads, and some stalled thread will
> > appear since all this is using thread pools. But threads that are never
> > woken up
> > are not a problem, it will use a bit (~2M but depends on the OS) of RAM
> > each.
> > Overall, the thread situation does not seems dramatic. If the compositor
> > could
> > be leaky, that would save you 3 threads.
> > 
> 
> 
> Thank you for the explanation! In this case what would the compositor being
> leaky look like in action? Would it basically just drop frames and thus hold
> the same image, or would segments drop out and blink or what? If it will hold
> the image through a leak that's completely fine. I just need it to be
> consistent until the appsrc needs to generate a new frame, which at the moment
> is only the initial frame, but in the future we plan to have animations as
> well as just still frames. Not sure if that changes anything in your mind?

Compositor is implementing GstAggregator, the sink pads on Aggregator implement
a data queue. It could be extended to support leaky operation similar to what
queue does, which means it supports both upstream and downstream leaky mode.
This would be if too many thread is a bottleneck for you.

>  
> > 
> > Memory wise you can do better for sure. All the queues can be configured
> > with a
> > smaller maximum size, most of them are set to default from what I see. You
> > can
> > also work on your encoder configuration. At the moment, it will gather
> > around 32
> > frames for observation and compression optimization. This likely gives great
> > quality, but might be overkill. Be aware that appsrc also have a internal
> > queue,
> > which capacity can be configured. Configuring queue capacity greatly
> > improves
> > the memory usage.
> > 
> 
> 
> Here is the new graph with how I understand some of your tweaks (please
> correct me if I'm off somewhere!). 
> debug_session (new).png
> Unfortunately, the memory and CPU footprints seem to be the same though, even
> after taking all of the queues to 2 buffers max. I need the final image

Note, appsrc is also a queue, it has some properties like max-buffers/max-bytes
etc, and also have leaky-type fwiw.

> quality to be as crisp as possible so it's easy for people with vision issues
> to read, but don't know enough about the encoder "knobs" to know which would
> be best to tweak for this. I've been reading through the plugin entries for
> the encoder, and the queues and while I feel like I have an ok grasp of the
> queue, I have very little grasp of the encoder so any pointers there would be
> SUPER helpful. 

I'd say read through x264 wiki, and try and see what could be change to keep
good quality but reduce the buffering. There is no single answer to that one,
tweaking an advanced encoder like x264 take patience. You could use simplier
encoder, like openh264 too.

> 
> Thank you so, so much for all the help you are giving! While I'm not where I
> want to be yet, I feel like I'm starting to understand the system much better
> and in the long run that will probably be even more valuable!^^
>  
> > 
> > 
> > 
> > > not sure if that's a lot or normal for this, honestly), but I would also
> > > love
> > > some memory optimizations as well! After those changes each stream is
> > > taking
> > > up about 850M of RAM while running. Again this may be normal for the task,
> > > but
> > > a) seems like a lot to me and b) I have no frame of reference.
> > > Thanks again, Nicolas!
> > > 
> > > On Mon, Apr 18, 2022 at 8:14 AM Nicolas Dufresne <nicolas at ndufresne.ca>
> > > wrote:
> > > > Le dimanche 17 avril 2022 à 03:29 -0500, Matt Clark via gstreamer-devel
> > > > a
> > > > écrit :
> > > > > I've gotten my project mostly to the stable point of working how I
> > > > > expect
> > > > > it,
> > > > > however I can't help but feel that it's nowhere near optimal. I have
> > > > > made
> > > > > it
> > > > > work and now I wish to make it right. Any insight be it pointers or
> > > > > instructions would be appreciated, as this is my first
> > > > > service/application
> > > > > using gstreamer and I'm still very green with it. 
> > > > 
> > > > Just notice from the graph one low hanging fruit. You have 3 color
> > > > conversion
> > > > points, 1 before freeze, one after and one inside compositor. The output
> > > > of
> > > > compositor is I420, you can greatly optimize your pipeline by adding a
> > > > caps
> > > > filter to force conversion before the image freeze. This way, you
> > > > convert
> > > > the
> > > > input to I420 only once. Other similar optimization in relation to the
> > > > usage
> > > > of
> > > > imagefreeze could happen.
> > > > 
> > > > > The basic explanation of the system is that it queries a variable
> > > > > number
> > > > > of
> > > > > web endpoints for dynamically created pngs and then composes those
> > > > > together
> > > > > into an HLS stream that's then used by a single client. 
> > > > > Here is a PNG of the pipeline graph (I'll also attach the raw SVG as
> > > > > well
> > > > > in
> > > > > case you want to dig into it):
> > > > > debug_session.png
> > > > > 
> > > > > TL;DR: Above is my pipeline, please help me make it the best it can
> > > > > be!
> > > > > Thanks to any and all in advance!
> > > > > -Matt
> > > > 
> >