[systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

Florian Weimer fw at deneb.enyo.de
Wed Jul 12 07:51:06 UTC 2017


* Lennart Poettering:

> On Tue, 11.07.17 21:26, Florian Weimer (fw at deneb.enyo.de) wrote:
>
>> * Lennart Poettering:
>> 
>> > Apparently, this regressed between this version and
>> > glibc-2.24-9.fc25.x86_64 hence.
>> 
>> Yes, I backported the fork cache removal to Fedora 25.  There is no
>> longer a good way to main such a cache in userspace because glibc
>> cannot intercept anymore all the ways that can change the PID of the
>> current process because the kernel interfaces for process management
>> are incredibly rich these days.
>
> Please be more specific here. What is this all about?

We got many bug reports over the years about sandboxes and other heavy
users of namespaces and clone that the glibc PID cache got out of
sync, both in child and parent (!) processes.

> What triggered this specifically? is this about docker? docker is
> written in golang anyway, iirc, which doesn't bother with linking to
> libc anyway?

It needs glibc for access to the host and user databases.

> Is this a glibc upstream choice primarily? Were the regressions this
> causes considered?

I raised the problem of applications calling getpid frequently and
named OpenSSL as an example.

> I mean, the getpid() checking code is not only in use in systemd, but
> in various other bits, in particular PulseAudio, where I started
> adding these checks for a good reason. It sounds pretty strange to me
> to just regress all that...

Fork detection using getpid is not reliable.  It gives false negatives
in the case of double-forks, where the process can be different but
the PID is the same due to reuse.  Considering that this use case is
broken, I don't think it's worthwhile to jump through hoops to support
code which is fundamentally broken anyway.


More information about the systemd-devel mailing list