Weston on framebuffer?

Wed Jul 20 23:30:54 UTC 2016

On Wed, 20 Jul 2016 21:49:57 +0200 Christer Weinigel <christer at weinigel.se>
said:

> Hi,
> 
> On 07/20/2016 01:04 AM, Carsten Haitzler (The Rasterman) wrote:
> >> With this I managed to get a desktop and was unable to start
> >> wayland-terminal.  Redrawing of the graphics felt fairly snappy, but the
> >> lag from pressing a key on the keyboard until a character showed up in
> >> the terminal was slow, probably between a quarter to half a second.
> >>
> >> So my question is if this is the performance I should expect with weston
> >> on a 400MHz ARM9 and a dumb framebuffer?  Have I done something stupid
> >> and there are easy ways to speed it up?
> 
> > when you say redraw is snappy... that implies that output is fast. so time
> > from deciding to render and update and it appearing is very short. but you
> > seem to have serious input lag which implies to me that it has nothing to
> > do with your cpu speed and is something else deeper and more involved. time
> > to trace things and see how they go.
> 
> I put up a short video here:
> 
> http://zoo.weinigel.se/misc/2016-07-20-213549.webm

that's not snappy. :) startup takes quite a while. but after that moving the
terminal window around is maybe getting you 6-7fps or so.

> On the framebuffer I don't perceive any lag at all between a keypress 
> and the character appearing on the screen.
> 
> With weston-terminal running I can drag the window around and even 
> though it's not very fast and there's a bit of tearing it isn't too bad. 
> The response when dragging feels ok.  Keypresses feel laggy even though 
> mouse motion doesn't, but I'm not sure if that's because I don't notice 
> the lag when moving the mouse or if it is a real difference.

well they are done by different things. the move will be done directly by
weston itself. it will be asked to begin a window move by the client and then
just do it itself. render the changes. key events have a different path. they
go to client, client handles it, draws new frame, then weston has to update
screen with that new frame.

it seems to be either weston-terminal is just slow at drawing there and thus is
ending up taking a while to draw, add another 200ms or so for weston itself and
thats probably what's going on.

weston reads input, sends 1 or more key events to client.
client gets input now does some updates/rendering (let's say takes 200ms
assuming weston terminal is slow-ish at rendering). let's now say client sends
update buffer to weston. weston now gets it, spends 200ms rendering, then reads
buffered input, sends backto client (it may have sent it before), but weston
will be either rendering a frame (takes 200ms or a bit less) or sending events.
not both. that means at least some events could take 600ms to come back to the
screen (almost half a second) because weston got blocks then client renders,
then come back to screen. so maybe 500ms on average. half a second.

i think rendering is slow and due to the above it just adds latency to the
point where you see it easily. you only have a single cpu. any cpu time used up
one place cannot be used elsewhere. no multilpe cores. :)

that's my guess. weston is either reading input + sending, or drawing, and the
big blobs of time spent drawing mean it's not reading and sending. so that adds
UP to ~200ms THEN client gets these. client may be still drawing a previous
frame, so doesn't respond for a little bit. let's say 100ms. then client draws.
let's say 100ms, then client sends new frame over to compositor. compositor
gets frame, begins draw. now 200ms more. NOW you see what you just typed. 600ms
later. more or less. which is about what it looks like. when moving a window,
weston gets mouse events, weston redraws, repeat. so 200ms lag. speed up the
drawing or allow drawing to happen in parallel and you're good.

remember weston is the SAMPLE compositor. it will not have been tuned to run
ultra-fast on your setup. you likely have a 16bpp display but what's actually
going on is clients are rendering in 32bpp so taking longer to render that they
would natively (like the text console), and then weston is likely rendering in
32bpp too... THEN it's down-converting to 16bpp for display. none of that is
free. :) you will likely not find much support these days that doesn't involve
down-conversion as everyone is handling alpha and thus 32bpp (yes you can do
16bpp+alpha mask for example, or pack argb 4444 into 16bpp and other
imaginative ways of getting it). dropping the whole pipeline down to something
like 16bpp+masks and a very carefully tuned pipeline would help.

(the reason i say 16bpp + masks is you can do a memcpy for the 16bpp data
direct to memory and since this doesn't convert it likely will be 2-3 times
faster - on the compositor fb side. on the client side the mask can be
pre-computed once for the window then just render 16bpp content, and with
opaque regions - since all the drawing happens inside those, the compositor can
skip blending entirely for regions inside the opaque rect and just memcpy. this
would involve defining a rgb565 + mask format for a buffer and have both sides
understand it, generate and read from it correctly - my guess is that the whole
update process would speed up dramatically if carefully hand optimized and kept
minimal like above and your latency will drop significantly as a result - down
to 1/3rd or 1/4 of what it is - we used to have a dedicated 16bpp rendering
backend that did just the above. rgb565+8bpp masks for alpha (maybe should have
used 4bpp but hey) and we did this because in those days on things like the
n770, n800, openmoko etc. devices they used soc's very much like youre or
exactly the same (the openmoko freerunner also used a samsung arm9 24xx at
about your clockrate), and having such a back end really got good speedups...
BUT it was a pain to maintain. a major pain. it was a whole parallel software
rendering pipeline just for this and we dropped it in the end to stop
maintaining it as it's just not worth it anymore)

do you want to confirm? don't just type. click and drag in the terminal to
select things. it should be laggy too. install more apps from full toolkits
(efl, gtk+, qt) and test them. scroll content around. etc. you should see
similar kind of lag.

> I tried to do a strace of weston-terminal, but it was a bit painful, it 
> reads every file it can find in /usr/share/icons/default/cursors/ when 
> it starts so strace took forever before the terminal would even show up.
> 
> And for trying to do more advanced tracing, I don't quite know where to 
> start.  Are there any knobs in the source to do things such as dump 
> timestamps for messages between the server and client?
> 
>   /Christer
> _______________________________________________
> wayland-devel mailing list
> wayland-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/wayland-devel

-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com