How to design a DRM KMS driver exposing 2D compositing?

Wed Aug 13 00:02:35 PDT 2014

On Tue, 12 Aug 2014 09:10:47 -0700
Eric Anholt <eric at anholt.net> wrote:

> Pekka Paalanen <ppaalanen at gmail.com> writes:
> 
> > On Mon, 11 Aug 2014 19:27:45 +0200
> > Daniel Vetter <daniel at ffwll.ch> wrote:
> >
> >> On Mon, Aug 11, 2014 at 10:16:24AM -0700, Eric Anholt wrote:
> >> > Daniel Vetter <daniel at ffwll.ch> writes:
> >> > 
> >> > > On Mon, Aug 11, 2014 at 01:38:55PM +0300, Pekka Paalanen wrote:
> >> > >> Hi,
> >> > >> 
> >> > >> there is some hardware than can do 2D compositing with an arbitrary
> >> > >> number of planes. I'm not sure what the absolute maximum number of
> >> > >> planes is, but for the discussion, let's say it is 100.
> >> > >> 
> >> > >> There are many complicated, dynamic constraints on how many, what size,
> >> > >> etc. planes can be used at once. A driver would be able to check those
> >> > >> before kicking the 2D compositing engine.
> >> > >> 
> >> > >> The 2D compositing engine in the best case (only few planes used) is
> >> > >> able to composite on the fly in scanout, just like the usual overlay
> >> > >> hardware blocks in CRTCs. When the composition complexity goes up, the
> >> > >> driver can fall back to compositing into a buffer rather than on the
> >> > >> fly in scanout. This fallback needs to be completely transparent to the
> >> > >> user space, implying only additional latency if anything.
> >> > >> 
> >> > >> These 2D compositing features should be exposed to user space through a
> >> > >> standard kernel ABI, hopefully an existing ABI in the very near future
> >> > >> like the KMS atomic.
> >> > >
> >> > > I presume we're talking about the video core from raspi? Or at least
> >> > > something similar?
> >> > 
> >> > Pekka wasn't sure if things were confidential here, but I can say it:
> >> > Yeah, it's the RPi.
> >> > 
> >> > While I haven't written code using the compositor interface (I just did
> >> > enough to shim in a single plane for bringup, and I'm hoping Pekka and
> >> > company can handle the rest for me :) ), my understanding is that the
> >> > way you make use of it is that you've got your previous frame loaded up
> >> > in the HVS (the plane compositor hardware), then when you're asked to
> >> > put up a new frame that's going to be too hard, you take some
> >> > complicated chunk of your scene and ask the HVS to use any spare
> >> > bandwidth it has while it's still scanning out the previous frame in
> >> > order to composite that piece of new scene into memory.  Then, when it's
> >> > done with the offline composite, you ask the HVS to do the next scanout
> >> > frame using the original scene with the pre-composited temporary buffer.
> >> > 
> >> > I'm pretty comfortable with the idea of having some large number of
> >> > planes preallocated, and deciding that "nobody could possibly need more
> >> > than 16" (or whatever).
> >> > 
> >> > My initial reaction to "we should just punt when we run out of bandwidth
> >> > and have a special driver interface for offline composite" was "that's
> >> > awful, when the kernel could just get the job done immediately, and
> >> > easily, and it would know exactly what it needed to composite to get
> >> > things to fit (unlike userspace)".  I'm trying to come up with what
> >> > benefit there would be to having a separate interface for offline
> >> > composite.  I've got 3 things:
> >> > 
> >> > - Avoids having a potentially long, interruptible wait in the modeset
> >> >   path while the offline composite happens.  But I think we have other
> >> >   interruptible waits in that path alreaady.
> >> > 
> >> > - Userspace could potentially do something else besides use the HVS to
> >> >   get the fallback done.  Video would have to use the HVS, to get the
> >> >   same scaling filters applied as the previous frame where things *did*
> >> >   fit, but I guess you could composite some 1:1 RGBA overlays in GL,
> >> >   which would have more BW available to it than what you're borrowing
> >> >   from the previous frame's HVS capacity.
> >> > 
> >> > - Userspace could potentially use the offline composite interface for
> >> >   things besides just the running-out-of-bandwidth case.  Like, it was
> >> >   doing a nicely-filtered downscale of an overlaid video, then the user
> >> >   hit pause and walked away: you could have a timeout that noticed that
> >> >   the complicated scene hadn't changed in a while, and you'd drop from
> >> >   overlays to a HVS-composited single plane to reduce power.
> >> > 
> >> > The third one is the one I've actually found kind of compelling, and
> >> > might be switching me from wanting no userspace visibility into the
> >> > fallback.  But I don't have a good feel for how much complexity there is
> >> > to our descriptions of planes, and how much poorly-tested interface we'd
> >> > be adding to support this usecase.
> >> 
> >> Compositor should already do a rough bw guesstimate and if stuff doesn't
> >> change any more bake the entire scene into a single framebuffer. The exact
> >> same issue happens on more usual hw with video overlays, too.
> >> 
> >> Ofc if it turns out that scanning out your yuv planes is less bw then the
> >> overlay shouldn't be stopped ofc. But imo there's nothing special here for
> >> the rpi.
> >>  
> >> > (Because, honestly, I don't expect the fallbacks to be hit much -- my
> >> > understanding of the bandwidth equation is that you're mostly counting
> >> > the number of pixels that have to be read, and clipped-out pixels
> >> > because somebody's overlaid on top of you don't count unless they're in
> >> > the same burst read.  So unless people are going nuts with blending in
> >> > overlays, or downscaled video, it's probably not a problem, and
> >> > something that gets your pixels on the screen at all is sufficient)
> >> 
> >> Yeah I guess we need to check reality here. If the "we've run out of bw"
> >> case just never happens then it's pointless to write special code for it.
> >> And we can always add a limit later for the case where GL is usually
> >> better and tell userspace that we can't do this many planes. Exact same
> >> thing with running out of memory bw can happen anywhere else, too.
> >
> > I had a chat with Eric last night, and our different views about the
> > on-line/real-time performance limits of the HVS seem to be due to alpha
> > blending.
> >
> > Eric has not been using alpha blending much or at all, while my
> > experiments with Weston and DispmanX pretty much always need alpha
> > blending (e.g. because DispmanX cannot say that only a sub-region of a
> > buffer needs blending). Eric says alpha blending kills the
> > performance.
> 
> Note, I wasn't saying anything about performance.  I was just talking
> about how compositing in X knows that (almost) everything is actually
> opaque, so I don't have the worries about alpha blending that you
> apparently do in Weston.

Ok, I'm confused.

Most surfaces in Weston do have non-opaque parts, usually the window
decorations, depending of course on the desktop visual style in use.
That means almost no surface is completely opaque, the wallpaper being
the obvious exception.

In Weston, we also do have the opaque region as set by apps as a
hint, that these regions do not need alpha blending. However with
DispmanX, there was no way to make use of the opaque region markup
unless it covered the whole surface.

Well, I could have split every window into 5 DispmanX elements
instead of just one (4 blended, 1 opaque) to approximate the usual
case with decorations, but I never tried that. There was some concern,
that the number of elements would become the dominating limit on how
much can be on screen at once, so it didn't feel worth the added
complexity, and enabling the automatic fallback to off-line just worked.

Alpha-blending can still be forced to a whole window by desktop
effects, though.

Does this explain why I saw that with DispmanX, the HVS on-line mode
would fail to reliably drive the output with just one or two basic app
windows open if even that much? IIRC that was on a 1280x1024 monitor,
not even close to a full-HD.

Thanks,
pq