[PATCH libinput 0/1] Initialization race against udev
Matt Hoosier
matt.hoosier at gmail.com
Tue Apr 24 15:29:46 UTC 2018
On Sun, Apr 22, 2018 at 11:01 PM, Peter Hutterer <peter.hutterer at who-t.net>
wrote:
> On Sun, Apr 22, 2018 at 06:07:36PM +0000, Friedrich, Eugen (ADITG/ESB)
> wrote:
> > Hi, sorry for late replay
> >
> > Best regards
> >
> > Eugen Friedrich
> > Engineering Software Base (ADITG/ESB)
> >
> > Tel. +49 5121 49 6921
> > > -----Original Message-----
> > > From: Peter Hutterer [mailto:peter.hutterer at who-t.net]
> > > Sent: Freitag, 6. April 2018 04:29
> > > To: Friedrich, Eugen (ADITG/ESB)
> > > Cc: Matt Hoosier; Pekka Paalanen; wayland mailing list
> > > Subject: Re: [PATCH libinput 0/1] Initialization race against udev
> > >
> > > On Thu, Apr 05, 2018 at 01:20:51PM +0000, Friedrich, Eugen (ADITG/ESB)
> wrote:
> > > > Hi Peter
> > > >
> > > > Best regards
> > > >
> > > > Eugen Friedrich
> > > > Engineering Software Base (ADITG/ESB)
> > > >
> > > > Tel. +49 5121 49 6921
> > > >
> > > > > -----Original Message-----
> > > > > From: wayland-devel [mailto:wayland-devel-
> > > > > bounces at lists.freedesktop.org] On Behalf Of Peter Hutterer
> > > > > Sent: Donnerstag, 5. April 2018 06:27
> > > > > To: Matt Hoosier
> > > > > Cc: Pekka Paalanen; wayland mailing list
> > > > > Subject: Re: [PATCH libinput 0/1] Initialization race against udev
> > > > >
> > > > > On Wed, Apr 04, 2018 at 09:04:32AM -0500, Matt Hoosier wrote:
> > > > > > I encounter another "bad" behavior related to multiple firings
> of the UDev
> > > > > > 'add' event for a given input node.
> > > > > >
> > > > > > Typically on modest embedded systems, you do not want to run the
> > > > > exhaustive
> > > > > > 'udevadm trigger' during the main booting sequence. That causes
> hundreds
> > > > > or
> > > > > > thousands of UDev nodes to be crawled and processed by udevd all
> during
> > > > > the
> > > > > > first moments of userspace. These overall bulk of these nodes
> are pretty
> > > > > > much irrelevant to the use-case of the device. The time spent
> processing
> > > > > > these non-essential /sys devices can easily slow down the
> device's
> > > > > progress
> > > > > > toward starting a UI with basic touch support and loading
> drivers for the
> > > > > > handful of essential peripherals by 10 or 15 seconds.
> > > > > >
> > > > > > Instead, it works better to manually command the same UDev
> coldplugging
> > > > > > that would have been done by 'udevadm trigger', but for a very
> small
> > > > > > hand-picked set of devices rather than everything in
> /sys/devices. The time
> > > > > > savings are large. Of course, for completeness you do eventually
> have to
> > > > > > run 'udevadm trigger' so that the full set of hardware and
> kernel software
> > > > > > features are activated, but that can wait until after the main
> KPIs are
> > > > > > achieved.
> > > > > >
> > > > > > This scheme generally works just fine. Manually stimulating
> udevd to
> > > > > > coldplug the specific devices you need keeps everything general:
> > > > > > applications that find their hardware with UDev (such as with
> libinput or
> > > > > > Weston's DRM backend) all get to rely on their usual well-tested
> > > > > codepaths.
> > > > > >
> > > > > > But there is a snag: if a device like /dev/input/event0 has been
> > > > > > coldplugged once with the hands-on technique, then all the
> daemons that
> > > > > > care about it have already seen one UDev ACTION=add event for
> it. When
> > > > > the
> > > > > > late-running 'udevadm trigger' does its exhaustive sweep across
> > > > > > /sys/devices, this will cause a second ACTION=add event to be
> triggered
> > > > > for
> > > > > > /dev/input/event0. Currently (well, with libinput 1.1.1) this
> causes
> > > > > > libinput -- and consequently Weston -- to open a second
> filedescriptor
> > > > > > against /dev/input/event0, so that all input events are received
> in
> > > > > > duplicate. That confuses the compositor's and applications'
> input event
> > > > > > handling.
> > > > > >
> > > > > > Would it be tolerable to put a patch into either libinput or
> Weston to
> > > > > > guard against double-opening the same input device? I realize
> that the
> > > > > > scheme outlined above is probably technically in violation of
> the expected
> > > > > > UDev initialization scheme, but I'm not aware of any way to
> suppress
> > > > > udevd
> > > > > > from broadcasting the 'add' action to all udev clients even
> though the
> > > > > > device has already been coldplugged once. It seems to me at
> least plausible
> > > > > > that the Weston stack would benefit from guarding against
> getting into this
> > > > > > bad state from receiving unexpected UDev events.
> > > > >
> > > > >
> > > > > I don't think a weston patch is likely to help you here since
> libinput's
> > > > > udev handling is independent of weston. but tbh, I'd rather not do
> this in
> > > > > libinput either. as you write, you're not really behaving like
> udev is
> > > > > supposed to and once we start having to deal with that the
> floodgates are
> > > > > open for weird patches. The code for this should be relatively
> > > > > self-contained and easy-enough to maintain, please maintain that
> as a
> > > > > distribution-specific patch.
> > > >
> > > > [EF] but as I can see there is a race condition which can always
> occur in
> > > > any system if device will be added to the system after installation
> of the
> > > > udev monitor and before initializing already coldplugged devices.
> > > > Can we really just accept this and hope it will never happen? Or I'm
> missing
> > > something?
> > >
> > > yeah, that race condition could occur but the chances are ... very
> slim.
> > > By the time libinput starts up, all the built-in devices are already
> > > connected and what are the chances that you plug in a USB device at
> that
> > > precise ms?
> > >
> > > Now, if you feel like writing patches for this you're welcome if
> they're not
> > > too intrusive but one problem with those patches is likely that they're
> > > almost impossible to test for the udev backend (other than by manually
> > > pausing libinput at the right time and hoping). So unless you can come
> up
> > > with a solution for that too, I'd rather take my chances with the race
> > > condition than maintaining untestable code that may not work correctly
> in
> > > the one case where we need it :)
> >
> > [ef] I agree about the point of not testable code.
> > My plan was to modify the evdev_udev_handler function and add check
> > in the "add" action to verify if the device is already added.
> > For testing we can just trigger the "add" action manually to execute new
> > code path, should be exactly same as problem I would like to solve and
> > it should also solve the original point mentioned by Matt.
> > Can you agree on this?
>
> you'll need to figure out a way how to automate this with the test suite.
> Should be possible running udevadm commands within the test but anything
> that requires a real "manually" isn't going to survive for long. Humans are
> lazy, machine testing is the way to go :)
>
If your question was aimed at me: I don't have a very strong opinion about
tactics for this. My only goal was to make Weston resilient against
duplicate UDev "device added" announcements. Whatever layer works best to
short-circuit the construction of a second instance of touch-processing
machinery is fine for me.
>
> > > > > especially if you're really still on 1.1.1 which came out in
> 2015...
> > > > >
> > > > > As for how it would work: device_added in src/udev-seat.c, you can
> get to
> > > > > the libinput_seat which has the device list (they're all struct
> > > > > evdev_device). You can get the fd from that and stat() it for the
> st_rdev
> > > > > or, simpler, add a field and strdup the device node into that
> struct and
> > > > > compare that.
> > > > >
> > > > > On second thought: I'm not sure how exactly you're handling the
> > > > > cold-plugging but is there a way you can add a udev rule that
> assigns
> > > > > LIBINPUT_IGNORE_DEVICE=1 to all input devices already handled?
> > > >
> > > > [EF] Could not get the exact idea! Should we set the attribute
> > > > LIBINPUT_IGNORE_DEVICE from the libinput code?
> > >
> > > Above you wrote:
> > > > Manually stimulating udevd to
> > > > coldplug the specific devices you need keeps everything general:
> > >
> > > I don't actually know what you mean by this, so I'm not sure what the
> udev
> > > behaviour is in this case. But if you're holding back the
> > > trigger anyway is that you can do whatever you do for the devices,
> then drop
> > > a udev rule in that sets LIBINPUT_IGNORE_DEVICE, then you run the
> udevadm
> > > trigger. Any device that is now handled by udev will have that property
> > > assigned and will thus be ignored, including all your duplicates.
> > >
> > > > Additional idea would be to use USEC_INITIALIZED time as unique
> identifier
> > > > in one boot cycle.
> > >
> > > Can you expand on how you think this would work?
> >
> > [ef] to find out if the device is already added or not I need some
> unique identifier
> > of the device, I thought to add the USEC_INITIALIZED member to each
> device
> > and if there is a device with the same USEC_INITIALIZED we will skip it.
>
> oh right. Nice idea, seems a bit roundabout though :)
> The syspath of a device should be unique since the kernel keeps
> incrementing
> the inputXXXX number. We already use that for some conflict resolution
> between devices and it's readily accessible, so that should work well
> enough.
>
> Cheers,
> Peter
>
>
> >
> > >
> > > Cheers,
> > > Peter
> > >
> > > > > > On Wed, Apr 4, 2018 at 6:51 AM, Pekka Paalanen <
> ppaalanen at gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > From: Pekka Paalanen <pekka.paalanen at collabora.co.uk>
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > here is a patch for a race that was troubling a distribution
> for an
> > > > > > > embedded device. The original form of this patch has been in
> developer
> > > > > > > use for months now, and it seems to work there. For this
> version I have
> > > > > > > only changed the logging and fixed a leak.
> > > > > > >
> > > > > > > I originally discussed this issue on #wayland with Peter
> Hutterer on
> > > > > > > September 22, 2017. The discussion is quoted below to refresh
> our
> > > > > > > memories.
> > > > > > >
> > > > > > > In the distribution, Weston is started by systemd, the system
> is not
> > > > > > > very CPU-powerful, and lots of things are happening during the
> boot,
> > > > > > > which may all contribute to making this race possible to lose.
> > > > > > >
> > > > > > > I've only tested briefly on my work desktop to see that Weston
> still
> > > > > > > appears to find the input devices I expect. Obviously my
> desktop would
> > > > > > > never lose the race, because there are no input devices being
> hotplugged
> > > > > > > at the same time as Weston is starting up.
> > > > > > >
> > > > > > > The timer/idle callback idea is not implemented here, and
> neither is the
> > > > > > > double-add filtering. Let me know if you require those.
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > pq
> > > > > > >
> > > > > > >
> > > > > > > < pq> whot, btw. we've looking into some fun libinput vs. udev
> > > > > > > device initialization race here. Apparently the device
> enumeration
> > > > > > > on libinput start-up may race with udev preparing devices at
> the
> > > > > > > same time, so the initial device enumeration for weston may see
> > > > > > > devices that have not had all their udev properties set yet.
> Do you
> > > > > > > recall ever fighting such issues?
> > > > > > >
> > > > > > > < pq> whot, another issue is that we may see a double-add of a
> > > > > > > device, first from the initial enumeration (with possibly
> missing
> > > > > > > properties) and then a second time as a hotplug event because
> > > > > > > libinput (correctly) listens for events before it enume
> existing
> > > > > > > devices.
> > > > > > >
> > > > > > > < whot> pq: yes, I've seen this a lot with the test suite, but
> never
> > > > > > > in real life
> > > > > > >
> > > > > > > < whot> I'm pretty sure in my case it's always a lingering udev
> > > > > > > device or some lingering properties
> > > > > > >
> > > > > > > < whot> e.g. a tablet has some keyboard properties set because
> the
> > > > > > > kernel re-uses the event node
> > > > > > >
> > > > > > > < whot> but that's triggered by the test suite using the path
> > > > > > > interface and there's bound to be a window where we can race.
> it
> > > > > > > shouldn't happen with the udev device, I think
> > > > > > >
> > > > > > > < pq> specifically, in the target device/system with
> touchscreens
> > > > > > > and weston, there are udev rules to set WL_OUTPUT on the
> device, but
> > > > > > > weston does not always see it.
> > > > > > >
> > > > > > > < pq> well, we use libinput via weston, and I don't think
> weston
> > > > > > > uses the path-based function. It's really the initial
> enumeration of
> > > > > > > input devices in libinput that has the issue.
> > > > > > >
> > > > > > > < whot> pq: a double-add is a bug, I can picture that
> happening with
> > > > > > > the current code. should be fixable
> > > > > > >
> > > > > > > * whot mumples something about syspath comparisons to filter
> that
> > > > > > > out
> > > > > > >
> > > > > > > < whot> mumbles, even. but it's late nough that I'm also open
> for
> > > > > > > mumpling
> > > > > > >
> > > > > > > < pq> whot, nice. Yeah, let's see if we have a patch to submit
> or
> > > > > > > just a bug report, it might take a while.
> > > > > > >
> > > > > > > < whot> pq: for the missing properties I have no idea, but that
> > > > > > > would, if anything, be a systemd bug
> > > > > > >
> > > > > > > < pq> whot, really? What's udev_device_get_is_initialised()
> for
> > > > > > > then? Does it not apply to the initial device enumeration as
> done by
> > > > > > > libinput?
> > > > > > >
> > > > > > > < pq> whot, how does initial device enumeration work? Does it
> go the
> > > > > > > udev daemon or just look in sysfs on its own?
> > > > > > >
> > > > > > > < whot> udev_device_get_is_initialized may help but it didn't
> with
> > > > > > > the test suite
> > > > > > >
> > > > > > > < whot> but for your case, it might just because you're
> fighting a
> > > > > > > different race
> > > > > > >
> > > > > > > < whot> should be easy enough to printf that to the log in
> your case
> > > > > > > and check if it's false when it fails
> > > > > > >
> > > > > > > < pq> whot, oh yeah, this didn't feel like a lingering device
> issue.
> > > > > > >
> > > > > > > < whot> initial enumeration: see udev_input_add_devices in
> > > > > > > src/udev-seat.c
> > > > > > >
> > > > > > > < whot> basically: "get list of devices maching subsystem
> input,
> > > > > > > create udev device for each syspath in that list"
> > > > > > >
> > > > > > > < pq> whot, our current idea is to have the initial
> enumeration to
> > > > > > > ignore devices where udev_device_get_is_initialized() returns
> false,
> > > > > > > because presumably we will get the hotplug event later anyway.
> > > > > > >
> > > > > > > < pq> that would avoid any busy-loop waiting
> > > > > > >
> > > > > > > < whot> you could schedule a callback for that device's syspath
> > > > > > >
> > > > > > > < whot> takes the whole "presumably" out of the question :)
> > > > > > >
> > > > > > > < pq> you mean like an idle task to re-check it?
> > > > > > >
> > > > > > > < whot> yeah, that or a timer
> > > > > > >
> > > > > > > < whot> not sure we have idle sources in libinput right now,
> so a
> > > > > > > timer is your better option here
> > > > > > >
> > > > > > > < pq> whot, ok, sure. Thanks for the tip. :-)
> > > > > > >
> > > > > > > Nandor Han (1):
> > > > > > > udev: validate input devices during cold-plug
> > > > > > >
> > > > > > > src/udev-seat.c | 11 +++++++++++
> > > > > > > 1 file changed, 11 insertions(+)
> > > > > > >
> > > > > > > --
> > > > > > > 2.16.1
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > wayland-devel mailing list
> > > > > > > wayland-devel at lists.freedesktop.org
> > > > > > > https://lists.freedesktop.org/mailman/listinfo/wayland-devel
> > > > > > >
> > > > >
> > > > > > _______________________________________________
> > > > > > wayland-devel mailing list
> > > > > > wayland-devel at lists.freedesktop.org
> > > > > > https://lists.freedesktop.org/mailman/listinfo/wayland-devel
> > > > >
> > > > > _______________________________________________
> > > > > wayland-devel mailing list
> > > > > wayland-devel at lists.freedesktop.org
> > > > > https://lists.freedesktop.org/mailman/listinfo/wayland-devel
> > > >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20180424/69fcd6c3/attachment-0001.html>
More information about the wayland-devel
mailing list