How to design a DRM KMS driver exposing 2D compositing?

Mon Aug 11 06:32:32 PDT 2014

On Mon, Aug 11, 2014 at 8:06 AM, Daniel Vetter <daniel at ffwll.ch> wrote:
> On Mon, Aug 11, 2014 at 01:38:55PM +0300, Pekka Paalanen wrote:
>> Hi,
>>
>> there is some hardware than can do 2D compositing with an arbitrary
>> number of planes. I'm not sure what the absolute maximum number of
>> planes is, but for the discussion, let's say it is 100.
>>
>> There are many complicated, dynamic constraints on how many, what size,
>> etc. planes can be used at once. A driver would be able to check those
>> before kicking the 2D compositing engine.
>>
>> The 2D compositing engine in the best case (only few planes used) is
>> able to composite on the fly in scanout, just like the usual overlay
>> hardware blocks in CRTCs. When the composition complexity goes up, the
>> driver can fall back to compositing into a buffer rather than on the
>> fly in scanout. This fallback needs to be completely transparent to the
>> user space, implying only additional latency if anything.
>>
>> These 2D compositing features should be exposed to user space through a
>> standard kernel ABI, hopefully an existing ABI in the very near future
>> like the KMS atomic.
>
> I presume we're talking about the video core from raspi? Or at least
> something similar?
>
>> Assuming the DRM universal planes and atomic mode setting / page flip
>> infrastructure is in place, could the 2D compositing capabilities be
>> exposed through universal planes? We can assume that plane properties
>> are enough to describe all the compositing parameters.
>>
>> Atomic updates are needed so that the complicated constraints can be
>> checked, and user space can try to reduce the composition complexity if
>> the kernel driver sees that it won't work.
>>
>> Would it be feasible to generate a hundred identical non-primary planes
>> to be exposed to user space via DRM?
>>
>> If that could be done, the kernel driver could just use the existing
>> kernel/user ABIs without having to invent something new, and programs
>> like a Wayland compositor would not need to be coded specifically for
>> this hardware.
>>
>> What problems do you see with this plan?
>> Are any of those problems unfixable or simply prohibitive?
>>
>> I have some concerns, which I am not sure will actually be a problem:
>> - Does allocating a 100 planes eat too much kernel memory?
>>   I mean just the bookkeeping, properties, etc.
>> - Would such an amount of planes make some in-kernel algorithms slow
>>   (particularly in DRM common code)?
>> - Considering how user space discovers all DRM resources, would this
>>   make a compositor "slow" to start?
>
> I don't see any problem with that. We have a few plane-loops, but iirc
> those can be easily fixed to use indices and similar stuff. The atomic
> ioctl itself should scale nicely.
>
>> I suppose whether these turn out to be prohibitive or not, one just has
>> to implement it and see. It should be usable on a slowish CPU with
>> unimpressive amounts of RAM, because that is where a separate 2D
>> compositing engine gives the most kick.
>>
>> FWIW, dynamically created/destroyed planes would probably not be the
>> answer. The kernel driver cannot decide before-hand how many planes it
>> can expose. How many planes can be used depends completely on how user
>> space decides to use them. Therefore I believe it should expose the
>> maximum number always, whether there is any real use case that could
>> actually get them all running or not.
>
> Yeah dynamic planes doesn't sound like a nice solution, least because
> you'll get to audit piles of code. Currently really only framebuffers (and
> to some extent connectors) can come and go freely in kms-land.
>
>> What if I cannot even pick a maximum number of planes, but wanted to
>> (as the hardware allows) let the 2D compositing scale up basically
>> unlimited while becoming just slower and slower?
>>
>> I think at that point one would be looking at a rendering API really,
>> rather than a KMS API, so it's probably out of scope. Where is the line
>> between KMS 2D compositing with planes vs. 2D composite rendering?
>
> I think kms should still be real-time compositing - if you have to
> internally render to a buffer and then scan that one out due to lack of
> memory bandwidth or so that very much sounds like a rendering api. Ofc
> stuff like writeback buffers blurry that a bit. But hw writeback is still
> real-time.

not really sure how much of this is exposed to the cpu side, vs hidden
on coproc..

but I tend to think it would be nice for compositors (userspace) to
know explicitly what is going on..  ie. if some layers are blended via
intermediate buffer, couldn't that intermediate buffer be potentially
re-used on next frame if not damaged?

>> Should I really be designing a driver-specific compositing API instead,
>> similar to what the Mesa OpenGL implementations use? Then have user
>> space maybe use the user space driver part via OpenWFC perhaps?
>> And when I mention OpenWFC, you probably notice, that I am not aware of
>> any standard user space API I could be implementing here. ;-)
>
> Personally I'd expose a bunch of planes with kms (enough so that you can
> reap the usual benefits planes bring wrt video-playback and stuff like
> that). So perhaps something in line with what current hw does in hw and
> then double it a bit or twice - 16 planes or so. Your driver would reject
> any requests that need intermediate buffers to store render results. I.e.
> everything that can't be scanned out directly in real-time at about 60fps.
> The fun with kms planes is also that right now we have 0 standards for
> z-ordering and blending. So would need to define that first.
>
> Then expose everything else with a separate api. I guess you'll just end
> up with per-compositor userspace drivers due to the lack of a widespread
> 2d api. OpenVG is kinda dead, and cairo might not fit.

I kind of suspect someone should really just design weston2d, an api
more explicitly for compositing.. model after OpenWFC if that fits
nicely.  Or not if it doesn't.  Or just use the existing weston
front-end/back-end split..

I expect other wayland compositors would want more or less the same
thing as weston (barring pre-existing layer-cake mess..  cough, cough,
cogl/clutter/gnome-shell..)

We could even make a gallium statetracker implementation of weston2d
to get some usage on desktop..

BR,
-R

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel