How to design a DRM KMS driver exposing 2D compositing?

Mon Aug 11 05:06:36 PDT 2014

On Mon, Aug 11, 2014 at 01:38:55PM +0300, Pekka Paalanen wrote:
> Hi,
> 
> there is some hardware than can do 2D compositing with an arbitrary
> number of planes. I'm not sure what the absolute maximum number of
> planes is, but for the discussion, let's say it is 100.
> 
> There are many complicated, dynamic constraints on how many, what size,
> etc. planes can be used at once. A driver would be able to check those
> before kicking the 2D compositing engine.
> 
> The 2D compositing engine in the best case (only few planes used) is
> able to composite on the fly in scanout, just like the usual overlay
> hardware blocks in CRTCs. When the composition complexity goes up, the
> driver can fall back to compositing into a buffer rather than on the
> fly in scanout. This fallback needs to be completely transparent to the
> user space, implying only additional latency if anything.
> 
> These 2D compositing features should be exposed to user space through a
> standard kernel ABI, hopefully an existing ABI in the very near future
> like the KMS atomic.

I presume we're talking about the video core from raspi? Or at least
something similar?

> Assuming the DRM universal planes and atomic mode setting / page flip
> infrastructure is in place, could the 2D compositing capabilities be
> exposed through universal planes? We can assume that plane properties
> are enough to describe all the compositing parameters.
> 
> Atomic updates are needed so that the complicated constraints can be
> checked, and user space can try to reduce the composition complexity if
> the kernel driver sees that it won't work.
> 
> Would it be feasible to generate a hundred identical non-primary planes
> to be exposed to user space via DRM?
> 
> If that could be done, the kernel driver could just use the existing
> kernel/user ABIs without having to invent something new, and programs
> like a Wayland compositor would not need to be coded specifically for
> this hardware.
> 
> What problems do you see with this plan?
> Are any of those problems unfixable or simply prohibitive?
> 
> I have some concerns, which I am not sure will actually be a problem:
> - Does allocating a 100 planes eat too much kernel memory?
>   I mean just the bookkeeping, properties, etc.
> - Would such an amount of planes make some in-kernel algorithms slow
>   (particularly in DRM common code)?
> - Considering how user space discovers all DRM resources, would this
>   make a compositor "slow" to start?

I don't see any problem with that. We have a few plane-loops, but iirc
those can be easily fixed to use indices and similar stuff. The atomic
ioctl itself should scale nicely.

> I suppose whether these turn out to be prohibitive or not, one just has
> to implement it and see. It should be usable on a slowish CPU with
> unimpressive amounts of RAM, because that is where a separate 2D
> compositing engine gives the most kick.
> 
> FWIW, dynamically created/destroyed planes would probably not be the
> answer. The kernel driver cannot decide before-hand how many planes it
> can expose. How many planes can be used depends completely on how user
> space decides to use them. Therefore I believe it should expose the
> maximum number always, whether there is any real use case that could
> actually get them all running or not.

Yeah dynamic planes doesn't sound like a nice solution, least because
you'll get to audit piles of code. Currently really only framebuffers (and
to some extent connectors) can come and go freely in kms-land.

> What if I cannot even pick a maximum number of planes, but wanted to
> (as the hardware allows) let the 2D compositing scale up basically
> unlimited while becoming just slower and slower?
> 
> I think at that point one would be looking at a rendering API really,
> rather than a KMS API, so it's probably out of scope. Where is the line
> between KMS 2D compositing with planes vs. 2D composite rendering?

I think kms should still be real-time compositing - if you have to
internally render to a buffer and then scan that one out due to lack of
memory bandwidth or so that very much sounds like a rendering api. Ofc
stuff like writeback buffers blurry that a bit. But hw writeback is still
real-time.

> Should I really be designing a driver-specific compositing API instead,
> similar to what the Mesa OpenGL implementations use? Then have user
> space maybe use the user space driver part via OpenWFC perhaps?
> And when I mention OpenWFC, you probably notice, that I am not aware of
> any standard user space API I could be implementing here. ;-)

Personally I'd expose a bunch of planes with kms (enough so that you can
reap the usual benefits planes bring wrt video-playback and stuff like
that). So perhaps something in line with what current hw does in hw and
then double it a bit or twice - 16 planes or so. Your driver would reject
any requests that need intermediate buffers to store render results. I.e.
everything that can't be scanned out directly in real-time at about 60fps.
The fun with kms planes is also that right now we have 0 standards for
z-ordering and blending. So would need to define that first.

Then expose everything else with a separate api. I guess you'll just end
up with per-compositor userspace drivers due to the lack of a widespread
2d api. OpenVG is kinda dead, and cairo might not fit.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch