[PATCH libinput 0/1] Initialization race against udev
Matt Hoosier
matt.hoosier at gmail.com
Fri Apr 6 14:13:56 UTC 2018
On Thu, Apr 5, 2018 at 9:28 PM, Peter Hutterer <peter.hutterer at who-t.net>
wrote:
> On Thu, Apr 05, 2018 at 01:20:51PM +0000, Friedrich, Eugen (ADITG/ESB)
> wrote:
> > Hi Peter
> >
> > Best regards
> >
> > Eugen Friedrich
> > Engineering Software Base (ADITG/ESB)
> >
> > Tel. +49 5121 49 6921
> >
> > > -----Original Message-----
> > > From: wayland-devel [mailto:wayland-devel-
> > > bounces at lists.freedesktop.org] On Behalf Of Peter Hutterer
> > > Sent: Donnerstag, 5. April 2018 06:27
> > > To: Matt Hoosier
> > > Cc: Pekka Paalanen; wayland mailing list
> > > Subject: Re: [PATCH libinput 0/1] Initialization race against udev
> > >
> > > On Wed, Apr 04, 2018 at 09:04:32AM -0500, Matt Hoosier wrote:
> > > > I encounter another "bad" behavior related to multiple firings of
> the UDev
> > > > 'add' event for a given input node.
> > > >
> > > > Typically on modest embedded systems, you do not want to run the
> > > exhaustive
> > > > 'udevadm trigger' during the main booting sequence. That causes
> hundreds
> > > or
> > > > thousands of UDev nodes to be crawled and processed by udevd all
> during
> > > the
> > > > first moments of userspace. These overall bulk of these nodes are
> pretty
> > > > much irrelevant to the use-case of the device. The time spent
> processing
> > > > these non-essential /sys devices can easily slow down the device's
> > > progress
> > > > toward starting a UI with basic touch support and loading drivers
> for the
> > > > handful of essential peripherals by 10 or 15 seconds.
> > > >
> > > > Instead, it works better to manually command the same UDev
> coldplugging
> > > > that would have been done by 'udevadm trigger', but for a very small
> > > > hand-picked set of devices rather than everything in /sys/devices.
> The time
> > > > savings are large. Of course, for completeness you do eventually
> have to
> > > > run 'udevadm trigger' so that the full set of hardware and kernel
> software
> > > > features are activated, but that can wait until after the main KPIs
> are
> > > > achieved.
> > > >
> > > > This scheme generally works just fine. Manually stimulating udevd to
> > > > coldplug the specific devices you need keeps everything general:
> > > > applications that find their hardware with UDev (such as with
> libinput or
> > > > Weston's DRM backend) all get to rely on their usual well-tested
> > > codepaths.
> > > >
> > > > But there is a snag: if a device like /dev/input/event0 has been
> > > > coldplugged once with the hands-on technique, then all the daemons
> that
> > > > care about it have already seen one UDev ACTION=add event for it.
> When
> > > the
> > > > late-running 'udevadm trigger' does its exhaustive sweep across
> > > > /sys/devices, this will cause a second ACTION=add event to be
> triggered
> > > for
> > > > /dev/input/event0. Currently (well, with libinput 1.1.1) this causes
> > > > libinput -- and consequently Weston -- to open a second
> filedescriptor
> > > > against /dev/input/event0, so that all input events are received in
> > > > duplicate. That confuses the compositor's and applications' input
> event
> > > > handling.
> > > >
> > > > Would it be tolerable to put a patch into either libinput or Weston
> to
> > > > guard against double-opening the same input device? I realize that
> the
> > > > scheme outlined above is probably technically in violation of the
> expected
> > > > UDev initialization scheme, but I'm not aware of any way to suppress
> > > udevd
> > > > from broadcasting the 'add' action to all udev clients even though
> the
> > > > device has already been coldplugged once. It seems to me at least
> plausible
> > > > that the Weston stack would benefit from guarding against getting
> into this
> > > > bad state from receiving unexpected UDev events.
> > >
> > >
> > > I don't think a weston patch is likely to help you here since
> libinput's
> > > udev handling is independent of weston. but tbh, I'd rather not do
> this in
> > > libinput either. as you write, you're not really behaving like udev is
> > > supposed to and once we start having to deal with that the floodgates
> are
> > > open for weird patches. The code for this should be relatively
> > > self-contained and easy-enough to maintain, please maintain that as a
> > > distribution-specific patch.
> >
> > [EF] but as I can see there is a race condition which can always occur in
> > any system if device will be added to the system after installation of
> the
> > udev monitor and before initializing already coldplugged devices.
> > Can we really just accept this and hope it will never happen? Or I'm
> missing something?
>
> yeah, that race condition could occur but the chances are ... very slim.
> By the time libinput starts up, all the built-in devices are already
> connected and what are the chances that you plug in a USB device at that
> precise ms?
>
> Now, if you feel like writing patches for this you're welcome if they're
> not
> too intrusive but one problem with those patches is likely that they're
> almost impossible to test for the udev backend (other than by manually
> pausing libinput at the right time and hoping). So unless you can come up
> with a solution for that too, I'd rather take my chances with the race
> condition than maintaining untestable code that may not work correctly in
> the one case where we need it :)
>
>
> > > especially if you're really still on 1.1.1 which came out in 2015...
> > >
> > > As for how it would work: device_added in src/udev-seat.c, you can get
> to
> > > the libinput_seat which has the device list (they're all struct
> > > evdev_device). You can get the fd from that and stat() it for the
> st_rdev
> > > or, simpler, add a field and strdup the device node into that struct
> and
> > > compare that.
> > >
> > > On second thought: I'm not sure how exactly you're handling the
> > > cold-plugging but is there a way you can add a udev rule that assigns
> > > LIBINPUT_IGNORE_DEVICE=1 to all input devices already handled?
> >
> > [EF] Could not get the exact idea! Should we set the attribute
> > LIBINPUT_IGNORE_DEVICE from the libinput code?
>
> Above you wrote:
> > Manually stimulating udevd to
> > coldplug the specific devices you need keeps everything general:
>
I'm not certain whether this question was addressed to me or Eugen's
commentary after my message.
In any case, when I say "manually stimulate udevd to send an 'add' event",
I mean exactly what Pekka interpreted it as. 'udevadm trigger' is nothing
fancy; it essentially just exhaustively crawls across /sys and writes the
string "add" into each device directory in which it finds a file named
/sys/.../uevent. There is no stateful interaction with udevd or anything
like that.
So in my case, I'm just manually doing "echo add >
/sys/devices/my-device-path/uevent" to shortcut the big expensive iteration
that would be imposed if you tried to use 'udevadm trigger
--lots-of-narrow-match-expressions'. Same effect from the perspective of
interacting with the kernel or udevd though.
>
> I don't actually know what you mean by this, so I'm not sure what the udev
> behaviour is in this case. But if you're holding back the
> trigger anyway is that you can do whatever you do for the devices, then
> drop
> a udev rule in that sets LIBINPUT_IGNORE_DEVICE, then you run the udevadm
> trigger. Any device that is now handled by udev will have that property
> assigned and will thus be ignored, including all your duplicates.
>
> > Additional idea would be to use USEC_INITIALIZED time as unique
> identifier
> > in one boot cycle.
>
> Can you expand on how you think this would work?
>
> Cheers,
> Peter
>
> > > > On Wed, Apr 4, 2018 at 6:51 AM, Pekka Paalanen <ppaalanen at gmail.com>
> > > wrote:
> > > >
> > > > > From: Pekka Paalanen <pekka.paalanen at collabora.co.uk>
> > > > >
> > > > > Hi,
> > > > >
> > > > > here is a patch for a race that was troubling a distribution for an
> > > > > embedded device. The original form of this patch has been in
> developer
> > > > > use for months now, and it seems to work there. For this version I
> have
> > > > > only changed the logging and fixed a leak.
> > > > >
> > > > > I originally discussed this issue on #wayland with Peter Hutterer
> on
> > > > > September 22, 2017. The discussion is quoted below to refresh our
> > > > > memories.
> > > > >
> > > > > In the distribution, Weston is started by systemd, the system is
> not
> > > > > very CPU-powerful, and lots of things are happening during the
> boot,
> > > > > which may all contribute to making this race possible to lose.
> > > > >
> > > > > I've only tested briefly on my work desktop to see that Weston
> still
> > > > > appears to find the input devices I expect. Obviously my desktop
> would
> > > > > never lose the race, because there are no input devices being
> hotplugged
> > > > > at the same time as Weston is starting up.
> > > > >
> > > > > The timer/idle callback idea is not implemented here, and neither
> is the
> > > > > double-add filtering. Let me know if you require those.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > pq
> > > > >
> > > > >
> > > > > < pq> whot, btw. we've looking into some fun libinput vs. udev
> > > > > device initialization race here. Apparently the device enumeration
> > > > > on libinput start-up may race with udev preparing devices at the
> > > > > same time, so the initial device enumeration for weston may see
> > > > > devices that have not had all their udev properties set yet. Do you
> > > > > recall ever fighting such issues?
> > > > >
> > > > > < pq> whot, another issue is that we may see a double-add of a
> > > > > device, first from the initial enumeration (with possibly missing
> > > > > properties) and then a second time as a hotplug event because
> > > > > libinput (correctly) listens for events before it enume existing
> > > > > devices.
> > > > >
> > > > > < whot> pq: yes, I've seen this a lot with the test suite, but
> never
> > > > > in real life
> > > > >
> > > > > < whot> I'm pretty sure in my case it's always a lingering udev
> > > > > device or some lingering properties
> > > > >
> > > > > < whot> e.g. a tablet has some keyboard properties set because the
> > > > > kernel re-uses the event node
> > > > >
> > > > > < whot> but that's triggered by the test suite using the path
> > > > > interface and there's bound to be a window where we can race. it
> > > > > shouldn't happen with the udev device, I think
> > > > >
> > > > > < pq> specifically, in the target device/system with touchscreens
> > > > > and weston, there are udev rules to set WL_OUTPUT on the device,
> but
> > > > > weston does not always see it.
> > > > >
> > > > > < pq> well, we use libinput via weston, and I don't think weston
> > > > > uses the path-based function. It's really the initial enumeration
> of
> > > > > input devices in libinput that has the issue.
> > > > >
> > > > > < whot> pq: a double-add is a bug, I can picture that happening
> with
> > > > > the current code. should be fixable
> > > > >
> > > > > * whot mumples something about syspath comparisons to filter that
> > > > > out
> > > > >
> > > > > < whot> mumbles, even. but it's late nough that I'm also open for
> > > > > mumpling
> > > > >
> > > > > < pq> whot, nice. Yeah, let's see if we have a patch to submit or
> > > > > just a bug report, it might take a while.
> > > > >
> > > > > < whot> pq: for the missing properties I have no idea, but that
> > > > > would, if anything, be a systemd bug
> > > > >
> > > > > < pq> whot, really? What's udev_device_get_is_initialised() for
> > > > > then? Does it not apply to the initial device enumeration as done
> by
> > > > > libinput?
> > > > >
> > > > > < pq> whot, how does initial device enumeration work? Does it go
> the
> > > > > udev daemon or just look in sysfs on its own?
> > > > >
> > > > > < whot> udev_device_get_is_initialized may help but it didn't with
> > > > > the test suite
> > > > >
> > > > > < whot> but for your case, it might just because you're fighting a
> > > > > different race
> > > > >
> > > > > < whot> should be easy enough to printf that to the log in your
> case
> > > > > and check if it's false when it fails
> > > > >
> > > > > < pq> whot, oh yeah, this didn't feel like a lingering device
> issue.
> > > > >
> > > > > < whot> initial enumeration: see udev_input_add_devices in
> > > > > src/udev-seat.c
> > > > >
> > > > > < whot> basically: "get list of devices maching subsystem input,
> > > > > create udev device for each syspath in that list"
> > > > >
> > > > > < pq> whot, our current idea is to have the initial enumeration to
> > > > > ignore devices where udev_device_get_is_initialized() returns
> false,
> > > > > because presumably we will get the hotplug event later anyway.
> > > > >
> > > > > < pq> that would avoid any busy-loop waiting
> > > > >
> > > > > < whot> you could schedule a callback for that device's syspath
> > > > >
> > > > > < whot> takes the whole "presumably" out of the question :)
> > > > >
> > > > > < pq> you mean like an idle task to re-check it?
> > > > >
> > > > > < whot> yeah, that or a timer
> > > > >
> > > > > < whot> not sure we have idle sources in libinput right now, so a
> > > > > timer is your better option here
> > > > >
> > > > > < pq> whot, ok, sure. Thanks for the tip. :-)
> > > > >
> > > > > Nandor Han (1):
> > > > > udev: validate input devices during cold-plug
> > > > >
> > > > > src/udev-seat.c | 11 +++++++++++
> > > > > 1 file changed, 11 insertions(+)
> > > > >
> > > > > --
> > > > > 2.16.1
> > > > >
> > > > > _______________________________________________
> > > > > wayland-devel mailing list
> > > > > wayland-devel at lists.freedesktop.org
> > > > > https://lists.freedesktop.org/mailman/listinfo/wayland-devel
> > > > >
> > >
> > > > _______________________________________________
> > > > wayland-devel mailing list
> > > > wayland-devel at lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/wayland-devel
> > >
> > > _______________________________________________
> > > wayland-devel mailing list
> > > wayland-devel at lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/wayland-devel
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20180406/2a930aeb/attachment-0001.html>
More information about the wayland-devel
mailing list