How to design a DRM KMS driver exposing 2D compositing?

Mon Aug 11 05:47:22 PDT 2014

Hi Daniel,

you make perfect sense as usual. :-)
Comments below.

On Mon, 11 Aug 2014 14:06:36 +0200
Daniel Vetter <daniel at ffwll.ch> wrote:

> On Mon, Aug 11, 2014 at 01:38:55PM +0300, Pekka Paalanen wrote:
> > Hi,
> > 
> > there is some hardware than can do 2D compositing with an arbitrary
> > number of planes. I'm not sure what the absolute maximum number of
> > planes is, but for the discussion, let's say it is 100.
> > 
> > There are many complicated, dynamic constraints on how many, what size,
> > etc. planes can be used at once. A driver would be able to check those
> > before kicking the 2D compositing engine.
> > 
> > The 2D compositing engine in the best case (only few planes used) is
> > able to composite on the fly in scanout, just like the usual overlay
> > hardware blocks in CRTCs. When the composition complexity goes up, the
> > driver can fall back to compositing into a buffer rather than on the
> > fly in scanout. This fallback needs to be completely transparent to the
> > user space, implying only additional latency if anything.
> > 
> > These 2D compositing features should be exposed to user space through a
> > standard kernel ABI, hopefully an existing ABI in the very near future
> > like the KMS atomic.
> 
> I presume we're talking about the video core from raspi? Or at least
> something similar?

Yes.

> > Assuming the DRM universal planes and atomic mode setting / page flip
> > infrastructure is in place, could the 2D compositing capabilities be
> > exposed through universal planes? We can assume that plane properties
> > are enough to describe all the compositing parameters.
> > 
> > Atomic updates are needed so that the complicated constraints can be
> > checked, and user space can try to reduce the composition complexity if
> > the kernel driver sees that it won't work.
> > 
> > Would it be feasible to generate a hundred identical non-primary planes
> > to be exposed to user space via DRM?
> > 
> > If that could be done, the kernel driver could just use the existing
> > kernel/user ABIs without having to invent something new, and programs
> > like a Wayland compositor would not need to be coded specifically for
> > this hardware.
> > 
> > What problems do you see with this plan?
> > Are any of those problems unfixable or simply prohibitive?
> > 
> > I have some concerns, which I am not sure will actually be a problem:
> > - Does allocating a 100 planes eat too much kernel memory?
> >   I mean just the bookkeeping, properties, etc.
> > - Would such an amount of planes make some in-kernel algorithms slow
> >   (particularly in DRM common code)?
> > - Considering how user space discovers all DRM resources, would this
> >   make a compositor "slow" to start?
> 
> I don't see any problem with that. We have a few plane-loops, but iirc
> those can be easily fixed to use indices and similar stuff. The atomic
> ioctl itself should scale nicely.

Very nice.

> > I suppose whether these turn out to be prohibitive or not, one just has
> > to implement it and see. It should be usable on a slowish CPU with
> > unimpressive amounts of RAM, because that is where a separate 2D
> > compositing engine gives the most kick.
> > 
> > FWIW, dynamically created/destroyed planes would probably not be the
> > answer. The kernel driver cannot decide before-hand how many planes it
> > can expose. How many planes can be used depends completely on how user
> > space decides to use them. Therefore I believe it should expose the
> > maximum number always, whether there is any real use case that could
> > actually get them all running or not.
> 
> Yeah dynamic planes doesn't sound like a nice solution, least because
> you'll get to audit piles of code. Currently really only framebuffers (and
> to some extent connectors) can come and go freely in kms-land.

Yup, thought so.

> > What if I cannot even pick a maximum number of planes, but wanted to
> > (as the hardware allows) let the 2D compositing scale up basically
> > unlimited while becoming just slower and slower?
> > 
> > I think at that point one would be looking at a rendering API really,
> > rather than a KMS API, so it's probably out of scope. Where is the line
> > between KMS 2D compositing with planes vs. 2D composite rendering?
> 
> I think kms should still be real-time compositing - if you have to
> internally render to a buffer and then scan that one out due to lack of
> memory bandwidth or so that very much sounds like a rendering api. Ofc
> stuff like writeback buffers blurry that a bit. But hw writeback is still
> real-time.

Agreed, that's a good and clear definition, even if it might make my
life harder.

I'm still not completely sure, that using an intermediate buffer means
sacrificing real-time (i.e. being able to hit the next vblank the user
space is aiming for) performance, maybe the 2D engine output rate
fluctuates so that the scanout block would have problems but a buffer
can still be completed in time. Anyway, details.

Would using an intermediate buffer be ok if we can still maintain
real-time? That is, say, if a compositor kicks the atomic update e.g.
7 ms before vblank, we would still hit it even with the intermediate
buffer? If that is actually possible, I don't know yet.

> > Should I really be designing a driver-specific compositing API instead,
> > similar to what the Mesa OpenGL implementations use? Then have user
> > space maybe use the user space driver part via OpenWFC perhaps?
> > And when I mention OpenWFC, you probably notice, that I am not aware of
> > any standard user space API I could be implementing here. ;-)
> 
> Personally I'd expose a bunch of planes with kms (enough so that you can
> reap the usual benefits planes bring wrt video-playback and stuff like
> that). So perhaps something in line with what current hw does in hw and
> then double it a bit or twice - 16 planes or so. Your driver would reject
> any requests that need intermediate buffers to store render results. I.e.
> everything that can't be scanned out directly in real-time at about 60fps.
> The fun with kms planes is also that right now we have 0 standards for
> z-ordering and blending. So would need to define that first.

I do not yet know where that real-time limit is, but I'm guessing it
could be pretty low. If it is, we might start hitting software
compositing (like Pixman) very often, which is too slow to be usable.

Defining z-order and blending sounds like peanuts compared to below.

> Then expose everything else with a separate api. I guess you'll just end
> up with per-compositor userspace drivers due to the lack of a widespread
> 2d api. OpenVG is kinda dead, and cairo might not fit.

Yeah, that is kind of the worst case, which also seems unavoidable.

Thanks,
pq