[systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

Lennart Poettering lennart at poettering.net
Wed Jul 12 08:23:38 UTC 2017


On Wed, 12.07.17 09:51, Florian Weimer (fw at deneb.enyo.de) wrote:

> * Lennart Poettering:
> 
> > On Tue, 11.07.17 21:26, Florian Weimer (fw at deneb.enyo.de) wrote:
> >
> >> * Lennart Poettering:
> >> 
> >> > Apparently, this regressed between this version and
> >> > glibc-2.24-9.fc25.x86_64 hence.
> >> 
> >> Yes, I backported the fork cache removal to Fedora 25.  There is no
> >> longer a good way to main such a cache in userspace because glibc
> >> cannot intercept anymore all the ways that can change the PID of the
> >> current process because the kernel interfaces for process management
> >> are incredibly rich these days.
> >
> > Please be more specific here. What is this all about?
> 
> We got many bug reports over the years about sandboxes and other heavy
> users of namespaces and clone that the glibc PID cache got out of
> sync, both in child and parent (!) processes.

have any links?

> > What triggered this specifically? is this about docker? docker is
> > written in golang anyway, iirc, which doesn't bother with linking to
> > libc anyway?
> 
> It needs glibc for access to the host and user databases.

can you elaborate? I fail to see any relationship between
unshare()/fork()/getpid() and NSS?

> > Is this a glibc upstream choice primarily? Were the regressions this
> > causes considered?
> 
> I raised the problem of applications calling getpid frequently and
> named OpenSSL as an example.

Link?

> > I mean, the getpid() checking code is not only in use in systemd, but
> > in various other bits, in particular PulseAudio, where I started
> > adding these checks for a good reason. It sounds pretty strange to me
> > to just regress all that...
> 
> Fork detection using getpid is not reliable.  It gives false negatives
> in the case of double-forks, where the process can be different but
> the PID is the same due to reuse.  Considering that this use case is
> broken, I don't think it's worthwhile to jump through hoops to support
> code which is fundamentally broken anyway.

Uh, that's a bit non-chalant, no? Yes, the UNIX PID concept is awfully
designed, but if you argue on that level, you#d have to remove kill(),
and half of the other syscalls that take a PID from glibc too...

The primary intention of checking the PID in our calls is to filter
out cases where people assume they can use our context objects across
fork()s: a clean, early error code is a ton better than memory
corruption. And I am pretty sure the usecase is very valid... And yes,
even if checking getpid() misses some theoretical corner cases,
pthread_atfork() or whatever else you propose will miss others too,
and is much uglier codewise, introduces deps, yadda yadda...

Gah, this is all so ugly. I understand systemd is not a program you
are particularly interested in, but making us chase around your
regressions is just mean...

Lennart

-- 
Lennart Poettering, Red Hat


More information about the systemd-devel mailing list