[RFC PATCH 00/37] Modesetting for atomic modesetting

Wed Mar 25 01:37:30 PDT 2015

On Tue, Mar 24, 2015 at 10:49:22PM +0000, Daniel Stone wrote:
> Hi,
> Yikes, I think we're talking past each other a bit. So I thought a v2
> might help.
> 
> On 24 March 2015 at 08:55, Daniel Vetter <daniel at ffwll.ch> wrote:
> > On Mon, Mar 23, 2015 at 04:58:47PM +0000, Daniel Stone wrote:
> >> On 23 March 2015 at 08:20, Daniel Vetter <daniel at ffwll.ch> wrote:
> >> > Ok this is quite a bit a different beast than what I expected. I think
> >> > it's way too intrusive for drivers to land quickly, and there's a big
> >> > depency chain linking everything. I think we need something much simpler.
> >>
> >> Yeah, I'm uneasy with how invasive it's got: hence the proposal to
> >> drop crtc->mode back into being inlined. I don't think we lose
> >> anything from doing that (as crtc->state->mode still changes), and
> >> that makes the diff much less scary. The constness changes no longer
> >> become required, but I think are still pretty useful from the point of
> >> view of setting (and enforcing) our existing expectations. I'd still
> >> look to push those anyway, but at a much more sedate pace.
> 
> As promised, have pulled this out. I've split the series into two
> parallel series, which barely intersect (i.e. the merge is trivial):
> http://cgit.collabora.com/git/user/daniels/linux.git/log/?h=wip/drm-next/drm-constness
> is the change to constify all crtc->mode users, without turning
> crtc->mode itself const, or into a pointer. I think these are all
> worthwhile cleanups in their own right, but as they're quite invasive,
> we can push them separately and much more slowly.
> 
> http://cgit.collabora.com/git/user/daniels/linux.git/log/?h=wip/drm-next/drm-modes-ref
> is the much less invasive[0] work to refcount modes, expose
> CRTC/connector modes to userspace through properties, and allow
> user-supplied blob properties. I think this is really quite safe, and
> the interactions with drivers are pretty minimal.
> 
> For now, I don't intend to really talk much about drm-constness, since
> it's neither interesting nor relevant. So this is just about
> drm-modes-ref.
> 
> >> >>   - as far as possible, modes should be relateable to their source, e.g. if
> >> >>     Plymouth pulls a particular mode from the encoder, and you pick up on that
> >> >>     mode as part of current configuration during handover, you should be able
> >> >>     to work backwards to where Plymouth sourced it, i.e. the encoder list
> >> >
> >> > With legacy setcrtc we already lose this information and thus far no one
> >> > seems to have cared. And I don't see the use-case since simply comparing
> >> > it to sources works well enough, in case you want to know where a mode is
> >> > from.
> >>
> >> On the other hand, I wouldn't suggest SetCrtc to really be a model to
> >> follow. I really don't mind SetCrtc breaking all these expectations
> >> above, mind.
> >
> > Well setcrtc is bound to stay around, and userspace using it will also
> > stick around for quite a bit I think. If we add new semantic guarantees
> > and then userspace can't rely on them we bear all the cost without much
> > benefit.
> 
> Right, if you're using SetCrtc then you can expect to lose, and that's
> fine. But I don't think we should hobble new userspace forever to the
> terrible interfaces we made previously.
> 
> >> Do we make any guarantee on connector->modes lifetimes? If it can't
> >> change without a hotplug event, then we know the ID is valid until the
> >> connector goes dark, at which point we have to drop any local cache
> >> relating to it.
> >>
> >> Saying that userspace must read every single mode property back every
> >> single time is pretty horrible from the point of view of requiring
> >> more userspace->kernel->userspace->kernel->[...] trips to do discovery
> >> every time, just because we decided not to work out a sensible
> >> lifetime strategy to expose to userspace.
> >
> > Current kernels should send you a hotplug event, so with those that
> > strategy works - you can cache until the next hotplug. But on older
> > kernels (and iirc we've only fixed this very recently) sometimes hotplug
> > events get lost. So there if someone asks for a full reprobe, you have to
> > do the full dance.
> 
> Right, but this isn't in any kernels yet, so we don't exactly need to
> worry about bad interactions with other parts of old kernels.
> 
> >> > This blew up with edid blob properties where SNA had one clever trick too
> >> > many and thought that matching edid blob prop id means the edid is
> >> > unchanged. But since we remove the old blob before we add the new one
> >> > you're pretty much guranteed to reuse the same slot.
> >>
> >> Sure, if you're not paying attention to the defined lifetime, then
> >> don't do any caching. But this is about _defining_ such a usable
> >> lifetime that userspace can.
> >
> > Hm let's backtrack a bit: What's the upside of this defined lifetime? Atm
> > I see a lot of cost on the kernel side (getting this series in will take a
> > few releases probably because there's so many interlocked subsystem-wide
> > changes). But I don't see a benefit.
> 
> I think the current drm-modes-ref branch should take care of the cost
> argument. The upside is that it gets much easier to debug and trace
> command streams (i.e. it's pretty easy to remember mode ID 34; much
> more tedious to dump out the entire modeline every time), and that
> it's much more difficult to screw up in incredibly subtle ways (two
> independent mode properties which have to be updated together at all
> times, including in error paths? what could possibly go wrong ...).

That's the price we pay for smooth upgrades - everything in atomic atm
keeps old and new things in sync somehow, and yep it's fragile. And I
don't see a real problem with keeping them in sync, there's not many users
for it. And the tricky bits can be extracted into two helpers (one where
you have a mode blob id for atomic, one where you have an already decoded
mode struct for legacy/internal paths) like I lined out somewhere below.

> > The only place we currently read out the current configuration is when
> > the compositor starts up so that it can perfectly take over whatever's
> > been set up by the firmware/boot splash for fastboot. Afterwards everyone
> > just bashes in their own config, even when switching between compositors
> > and all that.
> 
> They bash in their own config, because right now it's about all you
> can do. As we're giving userspace an interface that allows sensible
> handovers and incremental updates, why not use it ... ? That we do
> full mode reprogramming when moving from splash screen to compositor
> is a bug, not a great piece of design. If we can (even subtly)
> encourage userspace to be better and smarter, then I don't see why we
> wouldn't.

We don't have full programming when moving from splash to compositor, at
least when you're running SNA. It does read out the entire kernel state to
reconstruct matching xrandr state. And it works already, no additional
guarantees needed.

Maybe splash2compositor isn't the only place we want to do this smooth
operation, but even if you do it on all vt switches its still a relatively
rare thing.

> It also makes it a nightmare to debug the userspace side of things,
> because when you just have to smash the entire config in all the time,
> it becomes much more difficult to separate the intended changes out
> from the unintended ones. Which means people are more likely to throw
> their hands up and just blame the kernel.
> 
> > Without a clear use-case that justifies the work I don't think we should
> > do this - someone will come up with clever abuse for it and then we have
> > another hard-to-work-around regression at hand. So from my side the only
> > requirement is that atomic clients should be able to somehow get at the
> > current blob properties like mode, gamma ...
> 
> Which of the requirements are you worried about people abusing?

If you really want to allow userspace to be able to compare mode ids
everywhere then sooner or later we'll have userspace that fully relies on
that. And then we need to add hacks in the kernel to make that work by
e.g. switching ids so that there's only ever one mode id per mode. All so
that userspace can avoid 1 more getblob ioctl plus a memcmp.

Forcing userspace to do the ioctl+full memcmp really seems like the more
solid approach. Especially since kernel state readout will only happen on
takeover (boot-up, vt-switch) which is a really rare thing. On top of that
libdrm doesn't give you any way currently to read out the kernel state
without forcing a full reprobe and edid read. SNA has fixed this since
years, Chris even floated a libdrm patch but adoption is minimal. Compare
to an ioctl blowing through edid reads unecessarily on takeover is an
entirely different beast. Given that I don't think anyone will ever
measure the few usec the additional ioctl costs.

> >> > I think we need much less: If your driver supports atomic (and hence
> >> > userspace might be asking for the mode blob prop id) then that blob should
> >> > survive as long as the mode is in use.
> >>
> >> Which not only requires the most invasive of the changes anyway
> >> (modulo those to crtc->mode, which can regardless be dropped), but
> >> also a bunch of 'every time the state changes, go consult some
> >> auxiliary property and potentially expire its lifetime'.
> >
> > Well yeah, but for the mode property the only place it changes (besides
> > the atomic ioctl) is in drm_atomic_helper_set_config.
> 
> You forgot intel_crtc_set_config. ;)

Will go away (mostly at least) once we've converted to atomic. Atomic
helpers match i915 modeset design for a reason ;-)

> > We have a lot more
> > compat glue already than the few lines you'd need to add in there. The
> > crucial bit is that we only need to make this work for atomic drivers,
> > since non-atomic drivers won't ever expose the MODE_ID property. And
> > non-atomic userspace doesn't care anyway.
> 
> That's actually a fair point, and why I don't so much mind rowing back
> on the crtc->mode changes. I'm happy to leave never-will-be-atomic
> drivers in their own little ghetto. But it does make it harder and
> more fiddly for people to implement atomic drivers: it's one more
> subtle thing to screw up and get wrong, especially in error paths.
> Whereas crtc_state->mode = drm_mode_reference(foo) is pretty hard to
> get wrong.

Drivers don't care about how the state gets filled in. Or well shouldn't
ever, that's all done in helpers. We can easily provide a helper like

drm_atomic_helper_fill_mode(crtc_state, mode_id)
{
	if (crtc_state->mode_blob)
		drm_blob_unref(crtc_state->mode_blob);

	mode_blob = lookup(mode_id); /* assume this grabs a ref */
	crtc_state->mode_blob = mode_blob;

	drm_umode_to_mode(mode_blob->data, &crtc_state->mode);
}

We already have similar helpers to set other things where tricky rules
apply. And if you want we can do a similar legacy helper which creates an
ad-hoc mode blob (similar to how the cursor ioctls gets remapped to plane
functions).

> > Also the important part here isn't so much the mode (userspace can get at
> > that), but other new blob properties like per-plane gamma or other color
> > corrections tables.
> 
> Sure.
> 
> >> > Somewhat unordered, but here's what I think we need:
> >> > - Subtyping blob properties is not needed, at least I can't think of a
> >> >   use-case. It will result though in lots of duplicated code for
> >> >   duplicated ref/unref functions and atomic prop handling.
> >>
> >> When you say 'subtyping', do you mean what I've done with getblob?
> >
> > Not just but yes. The other result is that we'd need to duplicate the
> > property decoding code for each type in the atomic ioctl/set_property
> > functions. And you need per-type reference/unreference functions.
> 
> At least as far as gamma and other LUTs are concerned, I agree.
> 
> >> > - Since we already have a getblob ioctl I think we should just extend the
> >> >   existing drm_property_blob:
> >>
> >> That was actually my initial implementation, but didn't like the
> >> resulting reference tie-up between drm_display_mode and
> >> drm_property_blob (aka drm_mode_modeinfo). I couldn't figure out a
> >> really clean way to do it that didn't provably fall down at some
> >> point, so abandoned it and went for referencing drm_display_mode
> >> instead.
> >
> > Why refcount drm_display_mode? That's the drm-internal representation, we
> > can just keep that as-is everywhere. I think that's the main point of why
> > you opted for this much more invasive refactoring, and I don't see why
> > we'd need it. The added kref for drm_blob_property is imo really all the
> > krefs I think we need.
> 
> I think it's now much less invasive. The reason I opted for doing it
> like that is because it avoided duplicating the mode structure in such
> a way that resulted in non-obvious breakage. I just don't like the
> potential for people to break the synchronisation between the two.
> (Which was part of the reason I ended up doing the crtc->mode
> constness dance in the first place ...)

Imo we need to justify each and every kref because refcounting tends to be
fragile and buggy imo. For blob properties we need the refcount as soon as
we allow userspace to create them and the kernel to use them
independently. We've tried to do that without refcounts for framebuffers
way back, it doesn't work well.

For the internal mode representation which is copied around all over the
place in modeset code I don't see any reasons why we need a kref. It's not
the userspace representation of a mode, so even if we decide to make some
guarantees about uniqueness we can still happily copy the modes around
internally. And as long as copying is ok that's much simpler than
refcounting.

> >> >   Plus ofc changing drm_property_create/destroy_blob into ref/unref
> >> >   functions. And doing the same weak reference trick for idrs as we're
> >> >   using for framebuffers now.
> 
> Indeed.
> 
> >> >   For mode properties the data contained would be struct drm_mode_modeinfo
> >> >   (i.e. the ABI struct we already use for the setcrtc/getconnector ioctls,
> >> >   not the internal one).
> >>
> >> Right, this is what's there: 25/37 is explicitly always returning
> >> modeinfo, and the creation does as well. I don't see any benefit to
> >> exposing the internal struct.
> >
> > Hm, maybe I need to explain my motiviation for the blob prop stuff: I want
> > to use it for gamma tables (not just the per-crtc one we have, but
> > per-plane tables too) and other bigger properties. Using it for the mode
> > by packing the existing drm_mode_modeinfo into a blob too was just a bit
> > an afterthought - for the mode we could as well just add properties for
> > all the individual members, like we've done with all the other structures.
> >
> > But loading gamma tables with gamma-0 up to gamma-1024 is way too
> > unwielding, so that's why we imo really need blobs.
> 
> Right, that makes total sense. I don't think anything in here
> precludes using this, except the precedent of refcounting the mode
> rather than the blob property. The reason I did that is because we
> already _have_ a drm_mode_object type for modes, and having two
> properties tied together in a mutual deathgrip seemed like an
> invitation for failure.
> 
> So for gamma, either we could take the same approach and generate
> wrapper objects around the blobs - which is useful when you have other
> representations which are internally widely used (e.g.
> drm_display_mode vs. drm_mode_modeinfo), or just refcount the blobs
> and let drivers deal with any conversion. My guess is that the latter
> would be most useful, as there's nothing we can really standardise on
> - and at least currently, gamma _only_ tends to come from userspace.
> Not only do modes have an alternate representation which is already
> useful enough to have its own object type and ID set, but they
> basically all come from the kernel, with the drm_mode_modeinfo being a
> conversion, rather than the other way around. Hence why I chose to
> keep the current internal representation as the primary.

I think modes are special because we have both uapi and internal versions
of it. So there's necessarily some conversion going on. And because atomic
state structures are meant to be const once committed it's simpler to
just keep both around in parallel.

For gamma otoh we just have the raw data and conversion (if there's any)
will happen in the driver when uploading. There it's good enough to carry
around just the undecoded blob with it's raw data. Even more so for
driver/hw specific properties which might directly represent some register
settings.

> So I think a lot of the similarities on there are misleading when you
> scratch the surface of it; again, I had started with that exact
> approach when I came into this, but later revised it.

Hm I think you've lost me - which similarities are we talking about now?

> >> >   Getting the refcounting right for the atomic ioctl should be simple. For
> >> >   the legacy ->set_config entrypoint we need to fix things up in the
> >> >   helper when we update the mode, by manually releasing the old mode blob
> >> >   and creating a new one (owned by the kernel to avoid leaks because old
> >> >   userspace won't clean them up - we've had this bug with compat cursor
> >> >   fbs just recently).
> >>
> >> To be honest I think the fb refcnt bug is pretty apropos here - by
> >> keeping these awkwardly split out, independently reference-counted,
> >> not sensibly linked to each other, and not even verifying the
> >> correctness of the users (const), you're inviting another disaster
> >> along the same lines. Especially as drivers start to move away from
> >> the helpers and are expected to manage more state themselves.
> >
> > Which fb refcount bug? We've had piles of them in the past, simply because
> > refcounting is apparently hard.
> 
> Yeah. So if we're introducing refcounting, why not make it as easy and
> as clear as possible, rather than splitting your objects between one
> that's refcounted and used for half of what you're doing, and another
> that isn't refcounted and is used for the other half? We already know
> that people screw this up, so why not make it as clear and as
> difficult to get wrong as possible?

Because they're not the same, at least to me. The refcounted blob prop
part is just the data transport between userspace and the kernel for all
kinds of bigger-ish modeset data. The internal representation is something
entirely different and imo doesn't need to be tied to the public one imo.

Well shouldn't even, since drivers exclusively deal with the internal
representation, so if we allow them to keep on copying those we'll have no
refcounting issues in drivers at all. Becuase there won't be any
refcounting going on.

> >> I agree the initial series was too invasive and there's value in
> >> having a much more minimal series, so how about I split out the
> >> crtc->mode constness (but not pointer - I don't think I have any
> >> desire to ever push that through tbqh) changes into another series we
> >> can track separately, and then see if we can get a much more minimal
> >> patchset that doesn't make me shudder and think of previous
> >> refcnt/state-consistency horrors, nor you of atomic conversions past,
> >> together. I'm happy to pull in the other guys doing atomic conversions
> >> and get some testing out of them to see if it'll work for them and how
> >> well, but having it run on Tegra was about as painless as I could've
> >> hoped for.
> >
> > Hm, how does a const crtc->mode as a non-pointer work? We do need to
> > update this one in set_config. Or would you do a forced cast in these
> > cases?
> >
> > But even for that I don't see why we'd need to change crtc->mode really to
> > make the atomic mode a blob property.
> 
> I agree; I'd initially thought that was necessary, and it was only
> when I'd got a reasonable way down the road that I'd realised it
> wasn't necessary. By the time I'd done that, it seemed prudent to
> finish it off and fix the drivers, if only to convince myself that it
> wouldn't be breaking drivers or introducing bugs by doing all this.
> 
> I'm not sure we're going to convince each other at this point, so I'll
> just skim over the core arguments and leave it at this:
>   - this is no longer really invasive at all
>   - modes aren't as similar to other blobs as they first seem (else
> we'd just be using drm_mode_modeinfo with no mode-object type
> everywhere)
>   - refcounting modes provides for much easier debuggability and
> traceability for both kernel and userspace, and especially across the
> boundary where you may not be familiar with the other side
>   - giving people another opportunity to screw things up is never a
> good thing, especially when (both in refcounting and specifically in
> modes) they've repeatedly proven that they do screw it up horrendously

Yeah I think I've understood your argument. I wanted to summarize my own
but then I suddenly noticed that you're mentioning that drm_mode_modeinfo
is a mode-object. That's faithful but useless copypasta from how xrandr
works, see

commit c55b6b3da25aa3af36ec51a13a4ed15fef0d7a73
Author: Ville Syrjälä <ville.syrjala at linux.intel.com>
Date:   Fri Apr 26 17:40:28 2013 +0300

    drm: Kill user_modes list and the associated ioctls

    There is no way to use modes added to the user_modes list. We never
    look at the contents of said list in the kernel, and the only operations
    userspace can do are attach and detach. So the only "benefit" of this
    interface is wasting kernel memory.

    Fortunately it seems no real user space application ever used these
    ioctls. So just kill them.

    Also remove the prototypes for the non-existing drm_mode_addmode_ioctl()
    and drm_mode_rmmode_ioctl() functions.

    v2: Use drm_noop instead of completely removing the ioctls

    Signed-off-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
    Signed-off-by: Dave Airlie <airlied at redhat.com>

Iow that mode id and mode list that still exists can probably be ripped
out without side-effects - the only way we transport modes back&forth
between the kernel and userspace is through the raw uapi structure.

Might be useful to do an audit of all userspace to check whether we could
remove more crap or not.

> If you have a good-faith look at the patchset (git URL is in with
> cgit) and still want it re-done to add the blob properties on the side
> instead, then I'll do that. But hopefully the much smaller series
> (I've also squashed some of the more aggressively-split patches, on
> the grounds that they're not that crucial for bisect) will convince
> you ... ?

Yeah I'll take a look.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch