[systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces

Robert P. J. Day rpjday at crashcourse.ca
Mon Feb 22 11:42:26 UTC 2021


On Mon, 22 Feb 2021, Greg KH wrote:

> On Mon, Feb 22, 2021 at 06:22:44AM -0500, Robert P. J. Day wrote:
> > On Thu, 18 Feb 2021, Lennart Poettering wrote:
> >
> > > On Do, 18.02.21 11:48, Robert P. J. Day (rpjday at crashcourse.ca) wrote:
> > >
> > > >   A colleague has reported the following apparent issue in a fairly
> > > > old (v230) version of systemd -- this is in a Yocto Project Wind River
> > > > Linux 9 build, hence the age of the package.
> > > >
> > > >   As reported to me (and I'm gathering more info), the system was
> > > > being put through some "longevity testing" by repeatedly adding,
> > > > removing, activating and de-activating network interfaces. According
> > > > to the report, the result was heap space slowly but inexorably being
> > > > consumed.
> > > >
> > > >   While waiting for more info, I'm going to examine the commit log for
> > > > systemd from v230 moving forward to collect any commits that address
> > > > memory leaks, then peruse them more carefully to see if they might
> > > > resolve the problem.
> > > >
> > > >   I realize it's asking a bit for folks here to remember that far
> > > > back, but does this issue sound at all familiar? Any pointers that
> > > > might save me some time? Thanks.
> > >
> > > Note that our hash tables operate with an allocation cache: when
> > > adding entries to them and then removing them again the memory
> > > required for that is not returned to the OS but added to a local
> > > cache. When the next entry is then added again, we recycle the
> > > cached entry instead of asking for new memory again. This allocation
> > > cache is a bit quicker then going to malloc() all the time, but
> > > means if you just watch the heap you'll assume there's a leak even
> > > though there isn't really, the memory is not lost after all, and
> > > will be reused eventually if we need it.
> > >
> > > You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off,
> > > but not sure v230 already knew that env var.
> >
> >   well, we seem to have isolated the issue, here it is in a nutshell
> > based on a condensed note i got from someone who tracked it down this
> > weekend. the memory leak is triggered by:
> >
> >   $ ssh root@<target> -p 830 -s netconf   [830 = netconf over SSH]
> >
> > long story short, according to jemalloc profiling, there is a massive
> > memory leak in DBUS code, to the tune of about 500M/day on a running
> > system. i'm perusing the profiling output now, but does any of this
> > sound even remotely familiar to anyone? i realize that's just a
> > summary, but does anyone remember seeing something related to this
> > once upon a time? [heavily-patched systemd_230 from wind river linux
> > 9].
>
> Given that this is a heavily patched system, please get support from
> the vendor that provided this as you are paying for this.  Don't ask
> the community to try to remember what happened with an old obsolete
> version of software, that's crazy...

  that's already in the pipeline, i was simply asking if anyone had
ever *seen* this before, just so we might be able to say, "hey, we're
not the first this has happened to."

  also, on the off-chance that anyone else is using a similarly-dated
version of systemd, they might say, "hmmmmm, that sounds suspiciously
like what's happening with *us*."

  just trying to be helpful.

rday


More information about the systemd-devel mailing list