[PATCH 14/17] drm/msm: add atomic support

Tue May 27 12:26:34 PDT 2014

On Tue, May 27, 2014 at 02:48:41PM -0400, Rob Clark wrote:
> On Tue, May 27, 2014 at 1:50 PM, Daniel Vetter <daniel at ffwll.ch> wrote:
> > On Tue, May 27, 2014 at 11:58:33AM -0400, Rob Clark wrote:
> >> On Mon, May 26, 2014 at 1:54 PM, Daniel Vetter <daniel at ffwll.ch> wrote:
> >> > Ok, I think I should have read your msm implementation a _lot_ earlier.
> >> > Explains your desing choices neatly.
> >> >
> >> > Two observations:
> >> >
> >> > - A GO bit makes nuclear pageflips ridiculously easy to implement,
> >> >   presuming the hardware actually works. And it's the sane model, so imo a
> >> >   good one to wrap the atomic helpers around.
> >> >
> >> >   But reality is often a lot more ugly, especially if you're employed by
> >> >   Intel. Which is why I'm harping so much on this helpers-vs-core
> >> >   interface issues ... We really need the full state transition in one
> >> >   piece to do anything useful.
> >> >
> >> > - msm doesn't have any resource sharing going on for modeset operations
> >> >   (which I mean lighting up crtcs to feed pixel streams to connectors).
> >> >   Which means the simplistic "loop over all crtcs and call the old
> >> >   ->setcrtc" approach actually works.
> >>
> >> we do actually have some shared resources on mdp5 generation (the
> >> "smp" blocks, basically a shared buffer which we need to allocate fifo
> >> space from for each plane)..
> >>
> >> I'm mostly ignoring this at the moment, because we don't support
> >> enough crtc's yet to run out of smp blocks.  But hooking in at the
> >> current ->atomic_commit() would be enough for me, I think.  Tbh, it is
> >> something that should be handled at the ->atomic_check() stage so I
> >> hadn't given much though to ->atomic_commit() stage.
> >>
> >> That all said, I've no problem with adding one more hook, after
> >> ->atomic_commit() and lock magic, before loop over planes/crtcs, so
> >> drivers that need to can replace the loops and do their own thing.
> >> Current state should be reasonably good for sane hw.  I'm just waiting
> >> for patches for i915 to add whatever it needs.
> >
> > I don't think that will be enough for you. Example, slightly hypothetical:
> >
> > - Configuration A: crtc 0 displays a small screen, crtc 1 a giant screen.
> >   Together they just barely fit into your fifo space.
> >
> > - Configuration B: crtc 0 drives a giant screen, crtc 1 a tiny one.
> >   Presume different outputs here so that no implicit output stealing
> >   happens upon mode switching.
> >
> > Atomic switch should work, but don't since if you just loop over crtcs you
> > have the intermediate stage where you try to drive 2 giant screens, which
> > will run out of fifo resources. And I think it's really common to have
> > such implicit resource sharing, maybe except for the most beefy desktop
> > gpu which simply can't run out of memory bw with today's screen.
> 
> Well, this situation is a bit problematic in other similar cases..
> moving plane between crtc's which needs to wait on a vblank is another
> vaguely similar case.  I was kinda thinking of punting on that one
> (ie. -EBUSY and userspace tries again on next frame).  Maybe for
> modeset that doesn't work out so well, since frame N+1 you'll still be
> at configuration A and have the same problem.
> 
> Would be kinda nice if helpers could order things according to what
> decreases resource requirements vs what increases.  Although we could
> probably get a lot of mileage out of just making the 'atomic loop over
> things helper' apply config for crtcs/planes which will be disabled
> first, before the ones which will be enabled at the end of the commit.
>  Hmm.

Yeah, that one should go a long way, but only for modeset changes. If we
do this for plane updates it will look _really_ bad ;-)

So for the "move plane from crtc to crtc" I think we need a separate step.
Presuming we don't have any modeset operations we should be able to group
all the plane updates per crtc. This ofc presumes a sane hw with GO bit.
Then the helper could figure out which crtc it needs to nuclear flip first
to be able to move the plane.

At least with the current atomic ioctl proposal we can't reject this with
-EBUSY since with a full modeset (which takes longer than one vblank
anyway) it's possible. Otoh we _should_ reject it when userspace expects a
vblank synced update.

Which is another reason for why I think we really should have a flag
somewhere for vblank synced atomic updates vs. atomic updates which take
longer than one vblank and might even involved a few msec of blank screens
for switching.

> Either way, if you have to replace 'atomic loop over things helper'
> with your own i915 specific thing, it shouldn't make much difference
> to you.  And I don't really think we need to solve all the worlds
> problems in the first version.  But seems like there could be some
> value in making helpers a bit more aware of shared resource
> constraints.

This isn't about i915, but about all the drivers using crtc helpers.
Correct ordering of the crtc helper hooks should pretty much solve this,
without the need to track global resources (i.e. disable everything before
we start enabling). At least for modeset-like atomic updates.

i915 will roll their own, but not because atomic updates aren't possible
with the crtc helpers, but because the crtc helpers are inadequate for our
hw. For modeset updates atomic or not doesn't factor in here.

And imo if we can make the crtc helpers work, we should do that. Otherwise
there won't be a whole lot of use behind the atomic modeset updates imo.

> > So afaics you really need to push a bit of smarts into the crtc helpers to
> > make atomic possible, which then leaves you with interaction issues
> > between the atomic stuff for GO bit capable hw for plane updates and the
> > modeset ordering hell.
> >
> >> >   The problem I see here is that if you have hardware with more elaborate
> >> >   needs (e.g. shared dplls), but otherwise sanity for plane updates (i.e.
> >> >   a GO bit) then your current atomic helpers will make it rather hard to
> >> >   mix this. So I think we should pimp the crtc helpers a bit to be atomic
> >> >   compliant (i.e. kill all outputs first before enabling anything new) and
> >> >   try to integrate this with the atomic helpers used for GO bit style
> >> >   updates.
> >>
> >> Not really, I don't think.  You can still do whatever shared resource
> >> setup in ->atomic_commit().  For drivers which want to defer commit
> >> (ie. doing commit after fb's ready from cpu, rather than from gpu), it
> >> would probably be more convenient to split up atomic_commit() so
> >> drivers have a post lock hook.  I think it is just a few line patch,
> >> and I think that it probably isn't really needed until i915 is ready.
> >> I'm pretty sure we can change this to do what i915 needs, while at the
> >> same time keeping it simple for 'GO bit' drivers.
> >>
> >> The bit about crtc helpers, I'll have to think about.  I'm not sure
> >> that we want to require that 'atomic' (modeset or pageflip) completely
> >> *replaces* current state.  So disabling unrelated crtcs doesn't seem
> >> like the right thing.  But we have some time to decide about the
> >> semantics of an atomic modeset before we expose to userspace.
> >
> > I'm not talking about replacing unrelated crtcs. It's also not about
> > underspecified state or about enabling/changing global resources.
> >
> > It is _only_ about ordering of the various operations: If both the
> > current and the desired new configuration are at the limit of what the hw
> > can do you can't switch to the new configuration by looping over all
> > crtcs. The fact that this doesn't work is the entire point of atomic
> > modesets. And if we have helpers which aren't cut out for the task at hand
> > for any hw where we need to have atomic modesets to make such changes
> > possible without userspace going nuts, then the helpers are imo simply not
> > useful
> >
> >> >   i915 has dpll sharing on ivb/hsw, but doesn't use the the crtc helpers
> >> >   anymore. But the radone eyefinity (or whatever the damn thing is called)
> >> >   I have lying around here fits the bill: It has 5 DP+ outputs but only 2
> >> >   dplls. So you can drive e.g. 3 DP screens and then switch to 2 HDMI
> >> >   screens atomically and it should all pan out.
> >> >
> >> >   But since your current helpers just loop over all crtcs without any
> >> >   regard to ordering constraints this will fall over if the 2 HDMI outputs
> >> >   get enabled before the 3 DP outputs get disabled all disabled.
> >>
> >> the driver should have rejected the config before it gets to commit
> >> stage, if it were an impossible config.
> >
> > The configuration _is_ possible. We simply have to be somewhat careful
> > with transitioning to it, since not _all_ intermediate states are
> > possible. Your current helpers presume that's the case, which afaik isn't
> > the case on any hw where we have global limits. For modesets.
> >
> > Nuclear pageflips are a completely different thing, as long as your hw has
> > a GO bit.
> >
> >> > So with all that in mind I have 3 sanity checks for the atomic interfaces
> >> > and helpers:
> >> >
> >> > 1. The atomic helpers should make enabling sane hw with a GO bit easy for
> >> > nuclear pageflips. Benchmark would be sane hw like msm or omap.
> >> >
> >> > 2. It should cooperate with the crtc helpers such that hw with some shared
> >> > resources (like dplls) works for atomic modesets. That pretty much means
> >> > "disable everything (including crtc/connetor pipelines that only changed
> >> > some parts) before enabling anything". Benchmark would be a platform with
> >> > shared dplls.
> >>
> >> btw, not something I'd thought about before, but shared dplls seem
> >> common enough, that it is something core could help with.  (Assuming
> >> they are all related to pixel clk and not some derived thing that core
> >> wouldn't know about.)
> >
> > Yup, that's what I'm trying to say ;-) But it was just an example, I
> > believe that atm your helpers don't help for any shared modeset resources
> > at all.
> 
> no, not at all (other than the ww_mutex stuff which should be useful
> for shared resources and more fine grained locking within driver).  It
> wasn't really a goal.
> 
> But having some knowledge about shared resources seems like it could
> make core helpers more useful when I get closer to exploiting the
> limits of the hw I have..  I suspect i915 is just further down that
> path than the other drivers.

I'm repeating myself, but simply ordering updates correctly should already
solve it. At least if the driver provides checks to make sure the new
config doesn't go over limits (e.g. by counting plls or required fifo
space). If we don't have that, the helpers are imo not sufficiently
validated as generally useful. And I have seen _way_ too much single use
code in the drm core from the old ums/dri1 days.

> >> I think we do need to decide what partial state updates me in the
> >> context of modeset or pageflip.  I kinda think the right thing is
> >> different for modeset vs pageflip.  Maybe we want to define an atomic
> >> flag to mean "disable/discard everything else"..  at any rate, we only
> >> need to sort that before exposing the API to userspace.
> >
> > Yeah, I still strongly support this split in the api itself. For i915 my
> > plan is to have separate configuration structures for modeset state and
> > pageflip/plane config state. When we do an atomic modeset we compute both
> > (maybe with some shortcut if we know that the pipe config didn't change at
> > all). Then comes the big switch:
> >
> > - If we have the same pipe config, we simply to a vblank synce nuclear
> >   flip to the new config.
> >
> > - If pipe configs change it will be much more elaborate:
> >
> >   1. Do a vblank synced nuclear flip to black on all pipes that need to go
> >   off (whether they get disabled or reconfigured doesn't matter for now).
> >
> >   2. Disable pipes.
> >
> >   3. Commit new state on the sw side.
> >
> >   4. Enable all pipes with the new config, but only scanning out black.
> >
> >   5. Do a vblank-synced nuclear flip to the new config. This would also be
> >   the one that would signale the drm events that the atomic update
> >   completed.
> >
> >   For fastboot we might need some hacks to fuse this all together, e.g for
> >   some panel fitter changes we don't need to disable the pipe completely.
> >   But that's the advanced stuff really.
> >
> > I think modelling the crtc helpers after this model could work. But that
> > means that the crtc helpers and the nuclear flip atomic helpers for GO
> > bit capable hw need to be rather tightly integrated, while still allowing
> > drivers to override the nuclear flip parts.
> >
> >> > 3. The core->driver interface should be powerful enough to support
> >> > insanity like i915, but no more. Which means all the code that's share
> >> > (i.e. the set_prop code I've been harping all over the place about) should
> >> > be done in the core, without any means for drivers to override. Currently
> >> > the drm core also takes care of a bunch of things like basic locking and
> >> > refcounting, and that's kinda the spirit for this. i915 is the obvious
> >> > benchmark here.
> >>
> >> The more I think about it, the more I think we should leave set_prop
> >> as it is (although possibly with drm_atomic_state ->
> >> drm_{plane,crtc,etc}_state).
> >>
> >> In the past, before primary plane, I really needed this.  And I expect
> >> having a convenient way to 'sniff' changing properties as they go by
> >> will be useful in some cases.
> >
> > I actually really like the addition of the state object to ->set_prop. At
> > least for i915 (which already has a fair pile of additional properties)
> > that looks like the perfect way to prep the stage.
> >
> > But at least for the transition we should keep the impact minimal. So no
> > manadatory callbacks and don't enforce the usage of the state object until
> > the drm core is fully converted to always follow a set_prop with a
> > ->commit. Since currently we have internal mode_set calls in all i915
> > set_prop functions and we need to convert them all. But we can't do that
> > until all the core stuff uses the atomic interface for all legacy ioctl
> > ops.
> >
> 
> fwiw, at least all set_prop ioctl stuff from core follows up with a
> atomic_commit now.  There are one or two more places doing mode_set
> un-atomically.  And I'm not sure if you call set_prop anywhere
> internally from i915.

We do modesets internally, but as long as those aren't externally visible
(which they shouldn't be if we grab locks before checking the state) it
should be all fine. Also from my pov the ->set_prop stuff is just
interface to construct the in-kernel representation of the desired config.
The real magic happens in the check/commit hooks (which is the same level
i915-internal modeset changes happens at).

I think one excellent use-case we get for free (almost) without the ioctl
would be fbcon. It very much wants to do an atomic update, so converting
that over to the atomic interface would be good imo.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch