[systemd-devel] Looking for known memory leaks triggered by stress testing add/remove/up/down interfaces
Greg KH
gregkh at linuxfoundation.org
Mon Feb 22 11:30:32 UTC 2021
On Mon, Feb 22, 2021 at 06:22:44AM -0500, Robert P. J. Day wrote:
> On Thu, 18 Feb 2021, Lennart Poettering wrote:
>
> > On Do, 18.02.21 11:48, Robert P. J. Day (rpjday at crashcourse.ca) wrote:
> >
> > > A colleague has reported the following apparent issue in a fairly
> > > old (v230) version of systemd -- this is in a Yocto Project Wind River
> > > Linux 9 build, hence the age of the package.
> > >
> > > As reported to me (and I'm gathering more info), the system was
> > > being put through some "longevity testing" by repeatedly adding,
> > > removing, activating and de-activating network interfaces. According
> > > to the report, the result was heap space slowly but inexorably being
> > > consumed.
> > >
> > > While waiting for more info, I'm going to examine the commit log for
> > > systemd from v230 moving forward to collect any commits that address
> > > memory leaks, then peruse them more carefully to see if they might
> > > resolve the problem.
> > >
> > > I realize it's asking a bit for folks here to remember that far
> > > back, but does this issue sound at all familiar? Any pointers that
> > > might save me some time? Thanks.
> >
> > Note that our hash tables operate with an allocation cache: when
> > adding entries to them and then removing them again the memory
> > required for that is not returned to the OS but added to a local
> > cache. When the next entry is then added again, we recycle the
> > cached entry instead of asking for new memory again. This allocation
> > cache is a bit quicker then going to malloc() all the time, but
> > means if you just watch the heap you'll assume there's a leak even
> > though there isn't really, the memory is not lost after all, and
> > will be reused eventually if we need it.
> >
> > You may use the env var SYSTEMD_MEMPOOL=0 to turn this logic off,
> > but not sure v230 already knew that env var.
>
> well, we seem to have isolated the issue, here it is in a nutshell
> based on a condensed note i got from someone who tracked it down this
> weekend. the memory leak is triggered by:
>
> $ ssh root@<target> -p 830 -s netconf [830 = netconf over SSH]
>
> long story short, according to jemalloc profiling, there is a massive
> memory leak in DBUS code, to the tune of about 500M/day on a running
> system. i'm perusing the profiling output now, but does any of this
> sound even remotely familiar to anyone? i realize that's just a
> summary, but does anyone remember seeing something related to this
> once upon a time? [heavily-patched systemd_230 from wind river linux
> 9].
Given that this is a heavily patched system, please get support from the
vendor that provided this as you are paying for this. Don't ask the
community to try to remember what happened with an old obsolete version
of software, that's crazy...
good luck!
greg k-h
More information about the systemd-devel
mailing list