[systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

vcaputo at pengaru.com vcaputo at pengaru.com
Fri Jul 7 20:14:39 UTC 2017


In doing some casual journalctl profiling and stracing, it became apparent
that `journalctl -b --no-pager` runs across a significant quantity of logs,
~10% of the time was thrown away on getpid() calls due to commmit a65f06b.

As-is:
 # time ./journalctl -b --no-pager > /dev/null

 real    0m11.033s
 user    0m10.084s
 sys     0m0.943s


After changing journal_pid_changed() to simply return 1:
 # time ./journalctl -b --no-pager > /dev/null

  real    0m9.641s
  user    0m9.449s
  sys     0m0.191s


More system time is being expended in repeated getpid() calls than write(),
see the strace:

 12:51:56.939287 write(1, "Jul 07 09:25:23 x61s unknown"..., 57) = 57 <0.001276>
 12:51:56.940633 getpid()                = 10713 <0.000032>
 12:51:56.940732 getpid()                = 10713 <0.000012>
 12:51:56.940801 getpid()                = 10713 <0.000032>
 12:51:56.940867 getpid()                = 10713 <0.000041>
 12:51:56.940942 getpid()                = 10713 <0.000041>
 12:51:56.941047 getpid()                = 10713 <0.000012>
 12:51:56.941117 getpid()                = 10713 <0.000012>
 12:51:56.941185 getpid()                = 10713 <0.000011>
 12:51:56.941253 getpid()                = 10713 <0.000011>
 12:51:56.941320 getpid()                = 10713 <0.000039>
 12:51:56.941395 getpid()                = 10713 <0.000041>
 12:51:56.941494 getpid()                = 10713 <0.000011>
 12:51:56.941561 getpid()                = 10713 <0.000012>
 12:51:56.941629 getpid()                = 10713 <0.000039>
 12:51:56.942942 write(1, "Jul 07 09:25:23 x61s unknown"..., 57) = 57 <0.000058>
 12:51:56.943052 getpid()                = 10713 <0.000039>
 12:51:56.943156 getpid()                = 10713 <0.000017>
 12:51:56.943230 getpid()                = 10713 <0.000018>
 12:51:56.943305 getpid()                = 10713 <0.000012>
 12:51:56.943374 getpid()                = 10713 <0.000017>
 12:51:56.943449 getpid()                = 10713 <0.000011>
 12:51:56.943517 getpid()                = 10713 <0.000011>
 12:51:56.943585 getpid()                = 10713 <0.000011>
 12:51:56.943652 getpid()                = 10713 <0.000011>
 12:51:56.943721 getpid()                = 10713 <0.000030>
 12:51:56.943796 getpid()                = 10713 <0.000041>
 12:51:56.943870 getpid()                = 10713 <0.000041>
 12:51:56.943944 getpid()                = 10713 <0.000041>
 12:51:56.944061 getpid()                = 10713 <0.001334>
 12:51:56.945459 write(1, "Jul 07 09:25:23 x61s unknown"..., 56) = 56 <0.000018>
 12:51:56.945544 getpid()                = 10713 <0.000017>
 12:51:56.945620 getpid()                = 10713 <0.000017>
 12:51:56.945694 getpid()                = 10713 <0.000012>
 12:51:56.945763 getpid()                = 10713 <0.000011>
 12:51:56.945832 getpid()                = 10713 <0.000012>
 12:51:56.945901 getpid()                = 10713 <0.000011>
 12:51:56.945969 getpid()                = 10713 <0.000011>
 12:51:56.946048 getpid()                = 10713 <0.000013>
 12:51:56.946118 getpid()                = 10713 <0.000024>
 12:51:56.946188 getpid()                = 10713 <0.000047>
 12:51:56.946277 getpid()                = 10713 <0.000041>
 12:51:56.946353 getpid()                = 10713 <0.000041>
 12:51:56.946428 getpid()                = 10713 <0.000040>
 12:51:56.946539 getpid()                = 10713 <0.001363>

As this is public sd-journal API, it's somewhat set in stone.  However,
there's nothing preventing the systemd-internal tooling from linking with
a less defensive/conformant underlying implementation shared with the public
API implementation where these kinds of overheads can be preserved.

For the curious; the logs being processed for this boot are 48 * 8MiB on SSD,
1.8Ghz Core2 Duo, 4.12 kernel.

Regards,
Vito Caputo


More information about the systemd-devel mailing list