plymouth performance

Thu Jun 11 12:26:34 PDT 2009

Hi,

> For the "without" tests, I removed the rhgb kernel parameter parameter
> and added console=tty0
Alright, so it's not actually "without plymouth" but "without plymouth
splash".  plymouth is still running filtering and logging boot
messages.

> On the XO-1:
>
> Without plymouth: 61.53 seconds bootup time
> With plymouth: 79.82 seconds bootup time
> +18.29 seconds increase, or 30% regression
>
> On the XO-1.5:
>
> Without plymouth: 24.78 seconds bootup time
> With plymouth: 27.07 seconds bootup time
> +2.29 seconds increase, or 10% regression
>
> We have a similar problem with OLPC's boot animation code on the XO-1.
> We recently found out that it slows down boot by about 15 seconds with
> just a simple animation.

Which splash plugin are you using?  Plymouth does all pixel
manipulation in software.  There's no gpu acceleration, and we don't
use mmx or sse or anything like that, so there's no cpu acceleraton
either.

A splash plugin that loads a lot of images or does a lot of full
screen updates will be slower than one that loads a few images and
just updates small parts of the screen.

For instance, I would expect fade-throbber to be a lot faster than say two-step.

> However, two superheroes came along and implemented an awesome new
> algorithm which only redraws the parts of the screen that have changed
> from the last frame.
We're most of the way there in plymouth.  We have a function:

ply_window_erase_area (window, x, y, width, height);

That calls into the splash plugin and tells it to clear just that area
of the screen.

So normally an animation on screen would do

/* queue future draw requests instead of going immediately to the hardware */
ply_frame_buffer_pause_updates (fb);

/* erase the old frame */
ply_window_erase_area (window, x, y, old_width, old_height);

/* draw the new frame */
ply_frame_buffer_fill_with_argb32_data (fb, {x, y, new_width,
new_height}, ... new frame data ...);

/* flush to the hardware */
ply_frame_buffer_unpause_updates (fb);

which in theory would cause only the changed parts to get drawn to the
framebuffer.

ply_frame_buffer_fill_with_argb32_data doesn't draw straight to the
frame buffer though.  Instead it draws to an intermediary shadow
framebuffer so that blending is fast, and so that you don't end up
with intermediary drawing on the screen. It doesn't flush the shadow
framebuffer to the actual hardware until the
ply_frame_buffer_unpause_updates() is called.  It knows how much of
the shadow buffer to flush to the hardware by keeping track of the
area of the rectangles passed to the frame buffer fill functions while
the frame buffer was paused.

This means, ideally, we should only redraw the parts of the screen
that have changed when the frame buffer is unpaused. The
implementation is a little naive, though.  It looks at the bounding
rectangle of all the fill calls in aggregate.  That is if there's a
drawing on the top-left corner of the screen and a drawing on the
bottom-right corner of the screen, then it will essentially do a full
screen update.

To make this more efficient we'd need to introduce a  "region" data
type that's composed of a list (or sorted tree) of area rectangles.
We'd then flush each rectangle one by one, instead of flushing the one
aggregate bounding box.  I don't think it would be *that* hard, it's
just never been a priority before.

--Ray