[Intel-gfx] [PATCH v7 09/11] drm: uevent for connector status change

Daniel Vetter daniel at ffwll.ch
Tue May 21 07:52:50 UTC 2019


On Tue, May 21, 2019 at 8:55 AM Pekka Paalanen <ppaalanen at gmail.com> wrote:
>
> On Mon, 20 May 2019 18:11:07 +0200
> Daniel Vetter <daniel at ffwll.ch> wrote:
>
> > On Fri, May 17, 2019 at 01:08:24PM +0300, Pekka Paalanen wrote:
> > > On Thu, 16 May 2019 14:24:55 +0200
> > > Daniel Vetter <daniel at ffwll.ch> wrote:
> > >
> > > > On Thu, May 16, 2019 at 11:22:11AM +0300, Pekka Paalanen wrote:
> > > > > On Wed, 15 May 2019 10:24:49 +0200
> > > > > Daniel Vetter <daniel at ffwll.ch> wrote:
> > > > >
> > > > > > On Wed, May 15, 2019 at 10:37:31AM +0300, Pekka Paalanen wrote:
> > > > > > > On Tue, 14 May 2019 16:34:01 +0200
> > > > > > > Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> > > > > > >
> > > > > > > > On Tue, May 14, 2019 at 3:36 PM Pekka Paalanen <ppaalanen at gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 14 May 2019 13:02:09 +0200
> > > > > > > > > Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> > > > > > > > >
> > > > > > > > > > On Tue, May 14, 2019 at 10:18 AM Ser, Simon <simon.ser at intel.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 2019-05-14 at 11:02 +0300, Pekka Paalanen wrote:
> > > > > > >
> > > > > > > ...
> > > > > > >
> > > > > > > > > > > > Hi Daniel,
> > > > > > > > > > > >
> > > > > > > > > > > > just to clarify the first case, specific to one very particular
> > > > > > > > > > > > property:
> > > > > > > > > > > >
> > > > > > > > > > > > With HDCP, there is a property that may change dynamically at runtime
> > > > > > > > > > > > (the undesired/desired/enabled tristate). Userspace must be notified
> > > > > > > > > > > > when it changes, I do not want userspace have to poll that property
> > > > > > > > > > > > with a timer.
> > > > > > > > > > > >
> > > > > > > > > > > > When that property alone changes, and userspace is prepared to handle
> > > > > > > > > > > > that property changing alone, it must not trigger a reprobe of the
> > > > > > > > > > > > connector. There is no reason to reprobe at that point AFAIU.
> > > > > > > > > > > >
> > > > > > > > > > > > How do you ensure that userspace can avoid triggering a reprobe with the
> > > > > > > > > > > > epoch approach or with any alternate uevent design?
> > > > > > > > > > > >
> > > > > > > > > > > > We need an event to userspace that indicates that re-reading the
> > > > > > > > > > > > properties is enough and reprobe of the connector is not necessary.
> > > > > > > > > > > > This is complementary to indicating to userspace that only some
> > > > > > > > > > > > connectors need to be reprobed instead of everything.
> > > > > > > > > > >
> > > > > > > > > > > Can't you use the PROPERTY hint? If PROPERTY is the HDCP one, skip the
> > > > > > > > > > > reprobing. Would that work?
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > yes, that would work, if it was acceptable to DRM upstream. The replies
> > > > > > > > > to Paul seemed to be going south so fast that I thought we wouldn't get
> > > > > > > > > any new uevent fields in favour of "epoch counters".
> > > > > > > > >
> > > > > > > > > > Yes that's the idea, depending upon which property you get you know
> > > > > > > > > > it's a sink change (needs full reprobe) or something else like hdcp
> > > > > > > > > > state machinery update.
> > > > > > > > >
> > > > > > > > > Right.
> > > > > > > > >
> > > > > > > > > > Wrt avoiding the full reprobe for sink changes: I think we should
> > > > > > > > > > indeed decouple that from the per-connector event for sink changes.
> > > > > > > > > > That along is a good win already, since you know for which connector
> > > > > > > > > > you need to call drmGetConnector (which forces the reprobe). It would
> > > > > > > > > > be nice to only call drmGetConnectorCurrent (avoids the reprobe), but
> > > > > > > > > > historically speaking every time we tried to rely on this we ended up
> > > > > > > > > > regretting things.
> > > > > > > > >
> > > > > > > > > What changed? This sounds very much what Paul suggested. Looking at it
> > > > > > > > > from userspace side:
> > > > > > > >
> > > > > > > > This sounds solid, some refinements below:
> > > > > > > >
> > > > > > > > > HOTPLUG=1 CONNECTOR=xx PROPERTY=yy
> > > > > > > > >
> > > > > > > > > - If yy is "Content Protection", no need to drmModeGetConnector(), just
> > > > > > > > >   re-get the connector properties.
> > > > > > > > >
> > > > > > > > > - Kernel probably shouldn't bother sending this for properties where
> > > > > > > > >   re-probe could be necessary, and send the below instead.
> > > > > > > >
> > > > > > > >
> > > > > > > > I think we should assert that the kernel can get the new property
> > > > > > > > values using drmModeGetConnectorCurrent for this case, i.e. the kernel
> > > > > > > > does not expect a full reprobe. I.e. upgrade your idea from "should"
> > > > > > > > to "must"
> > > > > > >
> > > > > > > Hi Daniel,
> > > > > > >
> > > > > > > ok, that's good.
> > > > > > >
> > > > > > > > Furthermore different property can indicate different kind of updates,
> > > > > > > > e.g. hdcp vs general sink change vs. whatever else might come in the
> > > > > > > > future.
> > > > > > >
> > > > > > > What do you mean by different kinds of updates?
> > > > > >
> > > > > > Atm we're discussing two:
> > > > > >
> > > > > > - "Content Protection"
> > > > > > - "sink changed, but you don't need to reprobe" this would be quite a bit
> > > > > >   a catch all from the output detection. Paul thinks differently, but I'm
> > > > > >   not sold on splitting this up more, at least not right now. This would
> > > > > >   include connector status (and related things returned by drmGetConnector
> > > > > >   which currently aren't a property), EDID, the mst path id, that kind of
> > > > > >   stuff.
> > > > > >
> > > > > > Ime once we have 2, there's more bound to come :-)
> > > > >
> > > > > Hi Daniel,
> > > > >
> > > > > I don't understand what the "sink changed" thing could be, but sure,
> > > > > there can be more.
> > > >
> > > > So if you have a repeater (hdmi or dp) and you change the thing you plug
> > > > into that, then on the computer you don't get a full hotplug, because the
> > > > repeater was always connected. Instead you get a short pulse hotplug,
> > > > indicating that something with the sink has changed. This could be a
> > > > slightly adjusted EDID (e.g. different eld in the audio section), or
> > > > something else. That's what I mean with "sink changed". Similar thing can
> > > > happen if you unplug and then plug in something else real quick (usually
> > > > over suspend/resume), where connector->status stays the same, but the
> > > > actual thing plugged in is different. I think for hdmi this is just the
> > > > EDID, but we do parse a bunch of things out of the EDID that have further
> > > > effects (color space, max clock). With DP there's also dp aux stuff, e.g.
> > > > if you switch from a 2 lane to a 4 lane cable then with same screen more
> > > > modes can work.
> > > >
> > > > Clearer?
> > > >
> > > > I guess for the documentation for this new uapi we need to make an
> > > > exhaustive list of all the things that might have changed for a "sink
> > > > changed" event, whatever we actually agree on what that should look like.
> > > > Or the PROPERTY=0 fallback you mention below as a fallback idea.
> > >
> > > Hi Daniel,
> > >
> > > to me all that sounds like userspace would better do a probe and start
> > > from scratch with that one connector. Therefore it would fall into the
> > > 'HOTPLUG=1 CONNECTOR=xx' case, no PROPERTY.
> > >
> > > I suppose I'm missing something?
> >
> > Doing a full probe is hella expensive. Atm you always have to do this, but
> > we're talking about the brave new future where the kernel sucks less, and
> > the kernel would have done the expensive probing for you already.
> >
> > > But also I don't mind, I have always expected there might be more
> > > properties whose change does not require a probe.
> > >
> > > So, the kernel does sometimes do the probe on its own as well, right?
> > > Is that completely invisible to userspace, or could it stall some
> > > userspace operations that are not a probe of the same connector?
> >
> > Major stall because the locking design isn't pretty. If the kernel is
> > probing right now even your GetConnectorCurrent or GetProperties (on a
> > connector) will stall for whatever long it takes to read the EDID. Or
> > whatever else the kernel is doing in the probe paths right now. We could
> > probably improve this, and make sure that at least GetConnectorCurrent and
> > GetProperties stop sucking. But needs some serious locking-fu to make that
> > work.
>
> Hi Daniel,
>
> ok. I was assuming the kernel locking was unfixable, due to e.g.
> hardware interfaces that are, well, hardware.
>
> It seems the locking would be the first thing needing to be fixed
> before anyone goes adding more automatic probing into the kernel,
> because otherwise you risk introducing random timing hickups for
> userspace, leading to someone screaming "kernel regression".

There's is fairly fine-grained locking all over kms, except for the
probe path. That has one lock over all connnectors. Only thing we need
to uncouple that from GetConnectorCurrent is to arrange for some
read-only cache of the last probe state (which might not be the
current stuff).

> > Aside: Just realized this is another important reason why we need to batch
> > up property updates. If we don't, then userspace will simply get held up
> > until the kernel is done anyway.
> >
> > > I really think the design of the uAPI must start with how userspace is
> > > expected to react to the events. For that there are three cases:
> > > re-deiscover and probe everyting, re-probe one connector, re-read
> > > properties of one connector without probe. Userspace can then discover
> > > what exactly changed, just like it already does.
> >
> > Yup, I think on this we all agree.
> >
>
> ...
>
> > > > > > > > > --------
> > > > > > > > >
> > > > > > > > > When discussing this in IRC, I had the concern about how uevents are
> > > > > > > > > delivered in userspace. Is there a possibility that they might be
> > > > > > > > > overwritten, contain stale attributes, or get squashed together?
> > > > > > > > >
> > > > > > > > > Particularly if a display server is current on the VT and active and
> > > > > > > > > monitoring udev, but stuck doing something and cannot service uevents
> > > > > > > > > very fast, and the kernel sends more than one event before the process
> > > > > > > > > gets back to dispatching. The terminology in libudev API confused me as
> > > > > > > > > an event is a device. Squashing together would make sense if the
> > > > > > > > > uevent were just updating a device attribute list. Previously when we
> > > > > > > > > had just a single kind of uevent, that would not have made a
> > > > > > > > > difference, but if we gain different kinds of uevents like here, it
> > > > > > > > > starts to matter.
> > > > > > > > >
> > > > > > > > > However, Paul came to the conclusion that we will be ok as long as the
> > > > > > > > > events come via netlink.
> > > > > > > >
> > > > > > > > Yeah netlink shouldn't drop events on the floor I think. It might
> > > > > > > > still happen, but then I think you should get an indication of that
> > > > > > > > error, and you just treat it as a general hotplug event like on older
> > > > > > > > kernels.
> > > > > > >
> > > > > > > Alright, although reading Paul it sounds like there is another
> > > > > > > (fallback?) method as well that wouldn't work. Should userspace worry
> > > > > > > about that?
> > > > > > >
> > > > > > > Hmm, get an indication of an error... I don't know how that would be
> > > > > > > presented in libudev API and I can't point to any code in Weston that
> > > > > > > would deal with it. Does anyone have a clue about that?
> > > > > > >
> > > > > > > Userspace cannot really start taking advantage of any new fine-grained
> > > > > > > hotplug events until it can rely on the event delivery. Granted, this
> > > > > > > seems purely a userspace issue, but I bet it could be formulated as a
> > > > > > > kernel regression: things stop working after upgrading the kernel while
> > > > > > > having always used new userspace which was ready for detailed hotplug
> > > > > > > events but didn't ensure the delivery in userspace.
> > > > > >
> > > > > > You have this already (if it's really an issue with netlink reliability,
> > > > > > tbh no idea), you can already miss a global uevent. It's easier to catch
> > > > > > up if you do miss it, since you're forcing a reprobe on everything. That's
> > > > > > why I think the EPOCH thing would be good, userspace could be defensive
> > > > > > and always call GetConnectorCurrent on all connectors if it gets any
> > > > > > hotplug uevent, and if it gets an EPOCH change, force a reprobe. But I'm
> > > > > > not sure that's really required (aside from VT switching).
> > > > >
> > > > > No, my concern is not an issue with netlink reliability. It is a
> > > > > potential issue when userspace chooses to not use netlink, and uses
> > > > > something else instead. I'm not sure what that else is but Paul says
> > > > > there is code in libudev and that is completely outside the control of
> > > > > KMS apps like display servers.
> > > >
> > > > afaik this other path only exists because it's the older one, for uapi
> > > > backwards compatibility with older userspace. Shouldn't be used for
> > > > anything.
> > >
> > > "Shouldn't be used" and someone screaming "kernel regression"... are you
> > > sure that path won't matter?
> > >
> > > Like some home-brewn distribution that happens to configure their
> > > libudev and kernel to use the old method, uses already new userspace,
> > > and then upgrades the kernel that starts sending fine-grained hotplug
> > > events, resulting the display server randomly handling hotplug wrong.
> > >
> > > Reading Airlie's recent rant about kernel regression handling make this
> > > a scary scenario where you would have no other choice than to rip all
> > > the fine-grained uevents out again.
> > >
> > > Is there any difference in the kernel code between the old method and
> > > the netlink method? Would it be possible to send fine-grained hotplug
> > > events only through netlink, and fall back to the old 'HOTPLUG=1' for
> > > the old method?
> >
> > There's a lot of grey in kernel regressions, and for fringe setups used by
> > few people I wouldn't worry about this. If they expect their shit to keep
> > working when using new stuff and crappy old interfaces, they get to keep
> > all the pieces.
>
> It didn't sound gray at all, reading Dave Airlie's email about it. If
> someone updates the kernel, and something works worse after that, then
> it is by definition a kernel regression. Period. And the earliest
> regression wins, i.e. if a revert breaks other things, the revert will
> be done regardless.
>
> > Dave's recent rant was a bit special, since userspace is clearly smoking
> > some strong stuff (-modesetting's atomic is seriously not using atomic
> > correctly), but it was also affecting too many people, and changing the
> > boot setup meant you'd get a black screen on boot-up already. Instead of
> > just on the first modeset with more than 1 screen.
>
> Then I think I missed the context of Dave's email. Reading it again, I
> still do not see that context.
>
> Btw. how do you determine "not using atomic correctly"? Has some uAPI
> specification for atomic appeared? I wasn't aware there was any uAPI
> specs, so there is no "incorrect use" if it happened to work once.

-modesetting atomic blows up with more than one screen (even if you
just move that screen between crtc). The breakage was that with the
fastboot changes we've put the one screen onto crtc 1, but by default
modesetting wants it on crtc 0, and it couldn't do that switch
anymore.

All current atomic in -modesetting can do is pageflip and dpms off/on
on the first screen on the first crtc. Anything more fancy goes boom,
like where you change the connector/crtc links.

It's _really_ broken :-)

> I don't personally really like these rules, but if these are the rules,
> then so be it. In my opinion it would be a huge step forward to get and
> require uAPI specifications, that people could verify both kernel and
> userspace against. Verifying against kernel code with no spec is what
> leads to the -modesetting issue by the sounds of it.
>
> Documenting kernel internal interfaces is not it. People reading
> DRM internal interface docs would need to know how DRM works internally
> before they could map that information into uAPI, which makes it less
> useful if not even useless for userspace developers.

vgem is the idea here for validation, but if people ship atomic code
that was never tested except for "boots on my laptop", then nothing is
going to help.

And yes we have a huge gap with uapi documentation. btw for properties
those section are meant to be useful for userspace people too:

https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html#standard-connector-properties

and all subsequent chapters. I guess it's a bit burried, but this part
is meant to be the uapi spec for properties. Is that also failing your
expectations?

> > There's also a fairly easy fix for that -modesetting issue: We don't
> > expose atomic if the compositor has a process name of "Xserver". Brutal,
> > but gets the job done. Once X is fixed, we can give a new "I'm not totally
> > broken anymore" interface to get back at atomic.
>
> You mean "Xorg". Or maybe "X". Or maybe the setuid helper? Wait, do you
> check against the process issuing ioctl by ioctl, or the process that
> opened the device? Which would be logind? What about DRM leasing? ...

In the Get/SetCaps ioctl we can do the check, which is called from X,
not logind. We just need some way to tell -modesetting apart from
everything else, and luckily there's not any other atomic X drivers.

> > tldr; I'm not worried at all, at least not more than with anything uapi.
> > Very rarely we'll have regrets.
>
> If you say so, ok. I'm a pessimist. I will certainly be happy if people
> can make progress with fine-grained uevents.

Maybe I've been suffering this for too long :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the Intel-gfx mailing list