[systemd-devel] [PATCH] cgroups: Cache controller masks and optimize queues.

David Timothy Strauss david at davidstrauss.net
Fri Nov 22 13:53:34 PST 2013


I'm disappointed in your response here. It's not up to your normally
high standards of technical discourse and tact on this mailing list. I
realize I've stepped on some feet, but there's quite a bit more to
this than me getting trigger-happy with a core commit.

On Fri, Nov 22, 2013 at 11:43 PM, Lennart Poettering
<lennart at poettering.net> wrote:
> "make distcheck" is broken now.

This is clearly not the key issue here, given the following:

 (a) "make distcheck" was already failing in the test-unit-file suite
before my commit (and continues to be failing in master). So, it was
already broken.

 (b) My oversight of excluding the Makefile.am lines to bundle the
test units the proper way for "make distcheck" was pretty minor in its
effect. The test still passed "make check", and I didn't block general
development or use of the project.

 (c) In the last couple weeks, you've broken builds yourself -- not
just tests -- for 12 hours [1][2]. We all make mistakes like that, but
I feel like you're using mine as a way to pile on criticism here.
There's a reason I emailed you privately when you broke the build that
day.

In any case, I fixed "make distcheck" [3] shortly after hearing about
the issue. I'll be sure to verify lack of any regression in "make
distcheck" before pushing in the future. I'll also add it to CI once
test-unit-file is working; I don't want to break CI builds in a
misleading way.

> I am fine if commiters commit without review if it's in their "own"
> submodule, or if it's man pages, or really obvious things. But this
> commit does not qualify. It touches the core, and it is far from obvious
> to me.

First, I'm not intentionally trying to skip process.

At the same time, please understand my frustration with a performance
issue introduced in v205 that added hours or days to the boot times of
our previously working servers. My attempts to discuss options and
approaches with you at LinuxCon, on the mailing list, and IRC have
only been marginally productive. Before my commit, we only had a TODO
logged.

Requesting reviews (by you and others) of proposed changes didn't get
much response. Zbigniew was the only person I was able to flag down in
any way, but he also said he has minimal familiarity with the current
cgroups implementations.

In an effort to improve the transparency of my changes and reduce the
chance of introducing instability, I coupled my changes with improved
functional testing in the key area I optimized. I then posted that
patch to the mailing list including some specific (but fairly
edge-case) follow-up questions, let it sit several days, and heard
back nothing. I hadn't even gotten a reply suggesting that a review
would be forthcoming.

> It includes lines like "TODO" which are a good indication that
> this isn't even thought out to the end...

I also think it's unfair to characterize my work here as not "thought out."

 (a) My changes went through testing by my ops crew to verify that
scalability issues were fixed for both stopped and starting units and
that cgroups still received proper configuration values and
controllers, including for slices.

 (b) I bundled a test suite covering functionality that lacked coverage before.

 (c) The TODO relates to a tiny optimization. The potential
improvement is so minor that the optimization is more an issue of
removing redundancy for tidiness. I used a similar approach in gutting
(but leaving in place for now) functions that probably aren't
necessary anymore given how they just return struct values.

 (d) The nature of the TODO isn't massively different from other items
in the TODO file. It just happens to be proximal to the related code.

 (e) My implementation replaced one running O(n^3) on startup for
enabled units, slightly better for disabled ones. There's no way the
previous implementation qualified as long-term and thought-out in the
first place. We're still waiting for some of those machines we started
nearly a week ago to finish applying initial cgroups configuration and
allow login. :-/

> Please, for stuff like this get a review from Kay, Zbigniew, Michal
> Schmidt, or me, before you commit. Thanks!

Indeed, I'd prefer for that to be the case. Let's talk about ways
contributors can have more confidence going forward that fixes to
regressions like this will get timely reviews. I know you're really
busy with the D-Bus and network work, but maintaining a stable, usable
master is increasingly important as this project matures.

[1] http://cgit.freedesktop.org/systemd/systemd/commit/?id=2b5c5383e48137d748681645ad7176f02b50ba30
[2] http://cgit.freedesktop.org/systemd/systemd/commit/?id=3db0e46b0de5e46922ebbbd77de4d3d1214bcc9a
[3] http://cgit.freedesktop.org/systemd/systemd/commit/?id=21acf11d407ea93d1b0c99088f9f64adad6cff0e


More information about the systemd-devel mailing list