[PATCH 14/17] drm/msm: add atomic support

Tue May 27 13:06:28 PDT 2014

On Tue, May 27, 2014 at 3:26 PM, Daniel Vetter <daniel at ffwll.ch> wrote:
> On Tue, May 27, 2014 at 02:48:41PM -0400, Rob Clark wrote:
>> On Tue, May 27, 2014 at 1:50 PM, Daniel Vetter <daniel at ffwll.ch> wrote:
>> > On Tue, May 27, 2014 at 11:58:33AM -0400, Rob Clark wrote:
>> >> On Mon, May 26, 2014 at 1:54 PM, Daniel Vetter <daniel at ffwll.ch> wrote:
>> >> > Ok, I think I should have read your msm implementation a _lot_ earlier.
>> >> > Explains your desing choices neatly.
>> >> >
>> >> > Two observations:
>> >> >
>> >> > - A GO bit makes nuclear pageflips ridiculously easy to implement,
>> >> >   presuming the hardware actually works. And it's the sane model, so imo a
>> >> >   good one to wrap the atomic helpers around.
>> >> >
>> >> >   But reality is often a lot more ugly, especially if you're employed by
>> >> >   Intel. Which is why I'm harping so much on this helpers-vs-core
>> >> >   interface issues ... We really need the full state transition in one
>> >> >   piece to do anything useful.
>> >> >
>> >> > - msm doesn't have any resource sharing going on for modeset operations
>> >> >   (which I mean lighting up crtcs to feed pixel streams to connectors).
>> >> >   Which means the simplistic "loop over all crtcs and call the old
>> >> >   ->setcrtc" approach actually works.
>> >>
>> >> we do actually have some shared resources on mdp5 generation (the
>> >> "smp" blocks, basically a shared buffer which we need to allocate fifo
>> >> space from for each plane)..
>> >>
>> >> I'm mostly ignoring this at the moment, because we don't support
>> >> enough crtc's yet to run out of smp blocks.  But hooking in at the
>> >> current ->atomic_commit() would be enough for me, I think.  Tbh, it is
>> >> something that should be handled at the ->atomic_check() stage so I
>> >> hadn't given much though to ->atomic_commit() stage.
>> >>
>> >> That all said, I've no problem with adding one more hook, after
>> >> ->atomic_commit() and lock magic, before loop over planes/crtcs, so
>> >> drivers that need to can replace the loops and do their own thing.
>> >> Current state should be reasonably good for sane hw.  I'm just waiting
>> >> for patches for i915 to add whatever it needs.
>> >
>> > I don't think that will be enough for you. Example, slightly hypothetical:
>> >
>> > - Configuration A: crtc 0 displays a small screen, crtc 1 a giant screen.
>> >   Together they just barely fit into your fifo space.
>> >
>> > - Configuration B: crtc 0 drives a giant screen, crtc 1 a tiny one.
>> >   Presume different outputs here so that no implicit output stealing
>> >   happens upon mode switching.
>> >
>> > Atomic switch should work, but don't since if you just loop over crtcs you
>> > have the intermediate stage where you try to drive 2 giant screens, which
>> > will run out of fifo resources. And I think it's really common to have
>> > such implicit resource sharing, maybe except for the most beefy desktop
>> > gpu which simply can't run out of memory bw with today's screen.
>>
>> Well, this situation is a bit problematic in other similar cases..
>> moving plane between crtc's which needs to wait on a vblank is another
>> vaguely similar case.  I was kinda thinking of punting on that one
>> (ie. -EBUSY and userspace tries again on next frame).  Maybe for
>> modeset that doesn't work out so well, since frame N+1 you'll still be
>> at configuration A and have the same problem.
>>
>> Would be kinda nice if helpers could order things according to what
>> decreases resource requirements vs what increases.  Although we could
>> probably get a lot of mileage out of just making the 'atomic loop over
>> things helper' apply config for crtcs/planes which will be disabled
>> first, before the ones which will be enabled at the end of the commit.
>>  Hmm.
>
> Yeah, that one should go a long way, but only for modeset changes. If we
> do this for plane updates it will look _really_ bad ;-)
>
> So for the "move plane from crtc to crtc" I think we need a separate step.
> Presuming we don't have any modeset operations we should be able to group
> all the plane updates per crtc. This ofc presumes a sane hw with GO bit.
> Then the helper could figure out which crtc it needs to nuclear flip first
> to be able to move the plane.
>
> At least with the current atomic ioctl proposal we can't reject this with
> -EBUSY since with a full modeset (which takes longer than one vblank
> anyway) it's possible. Otoh we _should_ reject it when userspace expects a
> vblank synced update.
>
> Which is another reason for why I think we really should have a flag
> somewhere for vblank synced atomic updates vs. atomic updates which take
> longer than one vblank and might even involved a few msec of blank screens
> for switching.

Well, there was the NONBLOCK atomic flag.. I'm not entirely sure if we
should hang so much off of that one flag.

>> Either way, if you have to replace 'atomic loop over things helper'
>> with your own i915 specific thing, it shouldn't make much difference
>> to you.  And I don't really think we need to solve all the worlds
>> problems in the first version.  But seems like there could be some
>> value in making helpers a bit more aware of shared resource
>> constraints.
>
> This isn't about i915, but about all the drivers using crtc helpers.
> Correct ordering of the crtc helper hooks should pretty much solve this,
> without the need to track global resources (i.e. disable everything before
> we start enabling). At least for modeset-like atomic updates.
>
> i915 will roll their own, but not because atomic updates aren't possible
> with the crtc helpers, but because the crtc helpers are inadequate for our
> hw. For modeset updates atomic or not doesn't factor in here.
>
> And imo if we can make the crtc helpers work, we should do that. Otherwise
> there won't be a whole lot of use behind the atomic modeset updates imo.
>
>> > So afaics you really need to push a bit of smarts into the crtc helpers to
>> > make atomic possible, which then leaves you with interaction issues
>> > between the atomic stuff for GO bit capable hw for plane updates and the
>> > modeset ordering hell.
>> >
>> >> >   The problem I see here is that if you have hardware with more elaborate
>> >> >   needs (e.g. shared dplls), but otherwise sanity for plane updates (i.e.
>> >> >   a GO bit) then your current atomic helpers will make it rather hard to
>> >> >   mix this. So I think we should pimp the crtc helpers a bit to be atomic
>> >> >   compliant (i.e. kill all outputs first before enabling anything new) and
>> >> >   try to integrate this with the atomic helpers used for GO bit style
>> >> >   updates.
>> >>
>> >> Not really, I don't think.  You can still do whatever shared resource
>> >> setup in ->atomic_commit().  For drivers which want to defer commit
>> >> (ie. doing commit after fb's ready from cpu, rather than from gpu), it
>> >> would probably be more convenient to split up atomic_commit() so
>> >> drivers have a post lock hook.  I think it is just a few line patch,
>> >> and I think that it probably isn't really needed until i915 is ready.
>> >> I'm pretty sure we can change this to do what i915 needs, while at the
>> >> same time keeping it simple for 'GO bit' drivers.
>> >>
>> >> The bit about crtc helpers, I'll have to think about.  I'm not sure
>> >> that we want to require that 'atomic' (modeset or pageflip) completely
>> >> *replaces* current state.  So disabling unrelated crtcs doesn't seem
>> >> like the right thing.  But we have some time to decide about the
>> >> semantics of an atomic modeset before we expose to userspace.
>> >
>> > I'm not talking about replacing unrelated crtcs. It's also not about
>> > underspecified state or about enabling/changing global resources.
>> >
>> > It is _only_ about ordering of the various operations: If both the
>> > current and the desired new configuration are at the limit of what the hw
>> > can do you can't switch to the new configuration by looping over all
>> > crtcs. The fact that this doesn't work is the entire point of atomic
>> > modesets. And if we have helpers which aren't cut out for the task at hand
>> > for any hw where we need to have atomic modesets to make such changes
>> > possible without userspace going nuts, then the helpers are imo simply not
>> > useful
>> >
>> >> >   i915 has dpll sharing on ivb/hsw, but doesn't use the the crtc helpers
>> >> >   anymore. But the radone eyefinity (or whatever the damn thing is called)
>> >> >   I have lying around here fits the bill: It has 5 DP+ outputs but only 2
>> >> >   dplls. So you can drive e.g. 3 DP screens and then switch to 2 HDMI
>> >> >   screens atomically and it should all pan out.
>> >> >
>> >> >   But since your current helpers just loop over all crtcs without any
>> >> >   regard to ordering constraints this will fall over if the 2 HDMI outputs
>> >> >   get enabled before the 3 DP outputs get disabled all disabled.
>> >>
>> >> the driver should have rejected the config before it gets to commit
>> >> stage, if it were an impossible config.
>> >
>> > The configuration _is_ possible. We simply have to be somewhat careful
>> > with transitioning to it, since not _all_ intermediate states are
>> > possible. Your current helpers presume that's the case, which afaik isn't
>> > the case on any hw where we have global limits. For modesets.
>> >
>> > Nuclear pageflips are a completely different thing, as long as your hw has
>> > a GO bit.
>> >
>> >> > So with all that in mind I have 3 sanity checks for the atomic interfaces
>> >> > and helpers:
>> >> >
>> >> > 1. The atomic helpers should make enabling sane hw with a GO bit easy for
>> >> > nuclear pageflips. Benchmark would be sane hw like msm or omap.
>> >> >
>> >> > 2. It should cooperate with the crtc helpers such that hw with some shared
>> >> > resources (like dplls) works for atomic modesets. That pretty much means
>> >> > "disable everything (including crtc/connetor pipelines that only changed
>> >> > some parts) before enabling anything". Benchmark would be a platform with
>> >> > shared dplls.
>> >>
>> >> btw, not something I'd thought about before, but shared dplls seem
>> >> common enough, that it is something core could help with.  (Assuming
>> >> they are all related to pixel clk and not some derived thing that core
>> >> wouldn't know about.)
>> >
>> > Yup, that's what I'm trying to say ;-) But it was just an example, I
>> > believe that atm your helpers don't help for any shared modeset resources
>> > at all.
>>
>> no, not at all (other than the ww_mutex stuff which should be useful
>> for shared resources and more fine grained locking within driver).  It
>> wasn't really a goal.
>>
>> But having some knowledge about shared resources seems like it could
>> make core helpers more useful when I get closer to exploiting the
>> limits of the hw I have..  I suspect i915 is just further down that
>> path than the other drivers.
>
> I'm repeating myself, but simply ordering updates correctly should already
> solve it. At least if the driver provides checks to make sure the new
> config doesn't go over limits (e.g. by counting plls or required fifo
> space). If we don't have that, the helpers are imo not sufficiently
> validated as generally useful. And I have seen _way_ too much single use
> code in the drm core from the old ums/dri1 days.
>
>> >> I think we do need to decide what partial state updates me in the
>> >> context of modeset or pageflip.  I kinda think the right thing is
>> >> different for modeset vs pageflip.  Maybe we want to define an atomic
>> >> flag to mean "disable/discard everything else"..  at any rate, we only
>> >> need to sort that before exposing the API to userspace.
>> >
>> > Yeah, I still strongly support this split in the api itself. For i915 my
>> > plan is to have separate configuration structures for modeset state and
>> > pageflip/plane config state. When we do an atomic modeset we compute both
>> > (maybe with some shortcut if we know that the pipe config didn't change at
>> > all). Then comes the big switch:
>> >
>> > - If we have the same pipe config, we simply to a vblank synce nuclear
>> >   flip to the new config.
>> >
>> > - If pipe configs change it will be much more elaborate:
>> >
>> >   1. Do a vblank synced nuclear flip to black on all pipes that need to go
>> >   off (whether they get disabled or reconfigured doesn't matter for now).
>> >
>> >   2. Disable pipes.
>> >
>> >   3. Commit new state on the sw side.
>> >
>> >   4. Enable all pipes with the new config, but only scanning out black.
>> >
>> >   5. Do a vblank-synced nuclear flip to the new config. This would also be
>> >   the one that would signale the drm events that the atomic update
>> >   completed.
>> >
>> >   For fastboot we might need some hacks to fuse this all together, e.g for
>> >   some panel fitter changes we don't need to disable the pipe completely.
>> >   But that's the advanced stuff really.
>> >
>> > I think modelling the crtc helpers after this model could work. But that
>> > means that the crtc helpers and the nuclear flip atomic helpers for GO
>> > bit capable hw need to be rather tightly integrated, while still allowing
>> > drivers to override the nuclear flip parts.
>> >
>> >> > 3. The core->driver interface should be powerful enough to support
>> >> > insanity like i915, but no more. Which means all the code that's share
>> >> > (i.e. the set_prop code I've been harping all over the place about) should
>> >> > be done in the core, without any means for drivers to override. Currently
>> >> > the drm core also takes care of a bunch of things like basic locking and
>> >> > refcounting, and that's kinda the spirit for this. i915 is the obvious
>> >> > benchmark here.
>> >>
>> >> The more I think about it, the more I think we should leave set_prop
>> >> as it is (although possibly with drm_atomic_state ->
>> >> drm_{plane,crtc,etc}_state).
>> >>
>> >> In the past, before primary plane, I really needed this.  And I expect
>> >> having a convenient way to 'sniff' changing properties as they go by
>> >> will be useful in some cases.
>> >
>> > I actually really like the addition of the state object to ->set_prop. At
>> > least for i915 (which already has a fair pile of additional properties)
>> > that looks like the perfect way to prep the stage.
>> >
>> > But at least for the transition we should keep the impact minimal. So no
>> > manadatory callbacks and don't enforce the usage of the state object until
>> > the drm core is fully converted to always follow a set_prop with a
>> > ->commit. Since currently we have internal mode_set calls in all i915
>> > set_prop functions and we need to convert them all. But we can't do that
>> > until all the core stuff uses the atomic interface for all legacy ioctl
>> > ops.
>> >
>>
>> fwiw, at least all set_prop ioctl stuff from core follows up with a
>> atomic_commit now.  There are one or two more places doing mode_set
>> un-atomically.  And I'm not sure if you call set_prop anywhere
>> internally from i915.
>
> We do modesets internally, but as long as those aren't externally visible
> (which they shouldn't be if we grab locks before checking the state) it
> should be all fine. Also from my pov the ->set_prop stuff is just
> interface to construct the in-kernel representation of the desired config.
> The real magic happens in the check/commit hooks (which is the same level
> i915-internal modeset changes happens at).
>
> I think one excellent use-case we get for free (almost) without the ioctl
> would be fbcon. It very much wants to do an atomic update, so converting
> that over to the atomic interface would be good imo.

Yes, iirc the remaining non-atomic paths are fbcon related.  In
principle it should be a simple matter to increment the refcnt on
fbcon state object and re-apply it.  Although at the moment we keep
track of *how* to apply the state (ie. page_flip vs set_config, etc)
as the state object is built up.. which isn't very conducive to
re-committing an existing state object.  Which is part of the reason I
wanted to deprecate the various existing
->page_flip/->update_plane/->set_config/etc and introduce per object
->commit()'s.  (Which could either be called by helpers, or called
internally by driver or completely ignored by driver)

I've been a bit reluctant so far to do too much additional refactoring
on top of atomic, since I'm about at the limit of what I have time to
repeatedly rebase each kernel version.  This is why I'm a bit anxious
to start merging some of atomic, even if it doesn't do absolutely
everything yet.

BR,
-R

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch