[systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
Robert P. J. Day
rpjday at crashcourse.ca
Thu Feb 18 19:26:40 UTC 2021
On Thu, 18 Feb 2021, Lennart Poettering wrote:
> On Do, 18.02.21 11:48, Robert P. J. Day (rpjday at crashcourse.ca) wrote:
>
> > A colleague has reported the following apparent issue in a fairly
> > old (v230) version of systemd -- this is in a Yocto Project Wind River
> > Linux 9 build, hence the age of the package.
> >
> > As reported to me (and I'm gathering more info), the system was
> > being put through some "longevity testing" by repeatedly adding,
> > removing, activating and de-activating network interfaces. According
> > to the report, the result was heap space slowly but inexorably being
> > consumed.
> >
> > While waiting for more info, I'm going to examine the commit log for
> > systemd from v230 moving forward to collect any commits that address
> > memory leaks, then peruse them more carefully to see if they might
> > resolve the problem.
> >
> > I realize it's asking a bit for folks here to remember that far
> > back, but does this issue sound at all familiar? Any pointers that
> > might save me some time? Thanks.
>
> Note that our hash tables operate with an allocation cache: when
> adding entries to them and then removing them again the memory
> required for that is not returned to the OS but added to a local
> cache. When the next entry is then added again, we recycle the cached
> entry instead of asking for new memory again. This allocation cache is
> a bit quicker then going to malloc() all the time, but means if you
> just watch the heap you'll assume there's a leak even though there
> isn't really, the memory is not lost after all, and will be reused
> eventually if we need it.
>
> You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off, but
> not sure v230 already knew that env var.
i don't think that's it, as i was told that, eventually, the system
crashes due to lack of memory. here's a snippet from "top" from about
an hour ago:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
... snip ...
1 root 20 0 1807772 1.699g 3572 S 0.0 2.7 3:05.78 systemd
as you can see, systemd is apparently already sucking up 1.7G, and i
was also told that this eventually gets up into reporting in units of
terabytes before the system falls over. so that doesn't sound like it.
i'm just about to start perusing the commit log since v230 to see if
anything looks appropriate.
rday
More information about the systemd-devel
mailing list