How to design a DRM KMS driver exposing 2D compositing?

Mon Aug 11 03:38:55 PDT 2014

Hi,

there is some hardware than can do 2D compositing with an arbitrary
number of planes. I'm not sure what the absolute maximum number of
planes is, but for the discussion, let's say it is 100.

There are many complicated, dynamic constraints on how many, what size,
etc. planes can be used at once. A driver would be able to check those
before kicking the 2D compositing engine.

The 2D compositing engine in the best case (only few planes used) is
able to composite on the fly in scanout, just like the usual overlay
hardware blocks in CRTCs. When the composition complexity goes up, the
driver can fall back to compositing into a buffer rather than on the
fly in scanout. This fallback needs to be completely transparent to the
user space, implying only additional latency if anything.

These 2D compositing features should be exposed to user space through a
standard kernel ABI, hopefully an existing ABI in the very near future
like the KMS atomic.

Assuming the DRM universal planes and atomic mode setting / page flip
infrastructure is in place, could the 2D compositing capabilities be
exposed through universal planes? We can assume that plane properties
are enough to describe all the compositing parameters.

Atomic updates are needed so that the complicated constraints can be
checked, and user space can try to reduce the composition complexity if
the kernel driver sees that it won't work.

Would it be feasible to generate a hundred identical non-primary planes
to be exposed to user space via DRM?

If that could be done, the kernel driver could just use the existing
kernel/user ABIs without having to invent something new, and programs
like a Wayland compositor would not need to be coded specifically for
this hardware.

What problems do you see with this plan?
Are any of those problems unfixable or simply prohibitive?

I have some concerns, which I am not sure will actually be a problem:
- Does allocating a 100 planes eat too much kernel memory?
  I mean just the bookkeeping, properties, etc.
- Would such an amount of planes make some in-kernel algorithms slow
  (particularly in DRM common code)?
- Considering how user space discovers all DRM resources, would this
  make a compositor "slow" to start?

I suppose whether these turn out to be prohibitive or not, one just has
to implement it and see. It should be usable on a slowish CPU with
unimpressive amounts of RAM, because that is where a separate 2D
compositing engine gives the most kick.

FWIW, dynamically created/destroyed planes would probably not be the
answer. The kernel driver cannot decide before-hand how many planes it
can expose. How many planes can be used depends completely on how user
space decides to use them. Therefore I believe it should expose the
maximum number always, whether there is any real use case that could
actually get them all running or not.

What if I cannot even pick a maximum number of planes, but wanted to
(as the hardware allows) let the 2D compositing scale up basically
unlimited while becoming just slower and slower?

I think at that point one would be looking at a rendering API really,
rather than a KMS API, so it's probably out of scope. Where is the line
between KMS 2D compositing with planes vs. 2D composite rendering?

Should I really be designing a driver-specific compositing API instead,
similar to what the Mesa OpenGL implementations use? Then have user
space maybe use the user space driver part via OpenWFC perhaps?
And when I mention OpenWFC, you probably notice, that I am not aware of
any standard user space API I could be implementing here. ;-)

Thanks,
pq