[systemd-devel] [Patch 0/2] logind: make sure that closed sessions will be cleaned

Djalal Harouni tixxdz at opendz.org
Thu Jan 23 08:44:55 PST 2014


On Wed, Jan 22, 2014 at 09:51:05AM +0000, Colin Guthrie wrote:
> 'Twas brillig, and Djalal Harouni at 22/01/14 09:24 did gyre and gimble:
> > On Thu, Jan 16, 2014 at 11:19:08AM +0100, Djalal Harouni wrote:
> >> On Thu, Jan 16, 2014 at 06:01:58AM +0100, Zbigniew Jędrzejewski-Szmek wrote:
> >>> Indeed, in a container (without your patches), sessions remain in
> >>> "closing" state. But with your patches, systemd --user instance is
> >>> started and killed immediately during login. Not good either :)
> >> Ok, ouch!
> > It seems that the systemd --user instance is killed without my patches!
> > 
> > Systemd git will confirm it.
> > 
> > Ok, after some thoughts and source code analysis, I came to the
> > conclusion that in order to correctly clean session and user state
> > files, we need:
> > 
> > a) Improve session_check_gc() and user_check_gc() by using the session
> > and user states. This implies b)
> > 
> > b) Make sure that the state of the session and the user is always
> > correctly set.
> 
> FWIW, I have some backported patches in Mageia that are not (yet) in the
> fedora package. As you mentioned Fedora earlier I figure that's what you
> are using:
Yes I'm doing tests with Fedora 20 systemd and systemd from git.

> Namely a partial backport of cc377381
> 
> Patch is here:
> 
> http://svnweb.mageia.org/packages/cauldron/systemd/current/SOURCES/0511-logind-Partial-backport-of-cc377381.patch?revision=566087&view=markup&pathrev=566087
> 
> It might be useful to apply to Fedora too. EDIT: Perhaps not, see below.
Ok, yes that patch is needed in case the gc is being run while the
service and slice jobs are in the starting or stoping process:

do not garbage-collect while we are in the middel of:

user_start() => user_start_slice() && user_start_service()

and

user_stop() => user_stop_slice() && user_stop_service()

Before and after the gc should run, this for sure prevents races!

Note: after the jobs are finished they will put the user in the gc queue
which will trigger the gc again.

Ok (currently I'm only reporting to systemd upstream).



Ohh I just remembered that Lennart added some logic to wait for the
service to finish startup before notifying the client that he can make
logins, this is perhaps not related but it should interest you:

commit dd9b67aa3e94

http://cgit.freedesktop.org/systemd/systemd/commit/?id=dd9b67aa3e9476b3a4b3e231006eea6d108c841f

> > I'll discuss some notes about a) and follow with patches that will try
> > to honore b).
> > 
> > I'm sending this to have some feedback before spending more time on it,
> > ahh yes debugging this beast is hard!
> > 
> > Please note that I did not take a look at seats logic, only sessions and
> > users using getty and ssh logins.
> > 
> > Please feel free to correct me!
> > 
> > 
> > a) Improve session_check_gc() and user_check_gc()
> > 
> > * user_check_gc() has an  "if (u->sessions)" check! this should be
> >   updated to check if there are sessions that are not in the "closing"
> >   state.
> >   
> >   If *all* the sessions are in the "closing" state then check each
> >   session to see if there are still running processes in the background:
> >   screen,...
> 
> If the session is closing, it has to, by definition, still have
> processes. If not, then the session wouldn't be closing, but actually
Of course.

> fully closed and disappeared...
Yes that's precisely where the bug is, currently fully closed sessions
are not removed they will not disappear, all processes have been closed
but sessions are still there.

Testing a systemd git will show that getty or ssh sessions will persist
after logouts unless you issue a D-Bus TerminateSession() or
TerminateUser() which is probably the case of display managers

or you will have to manuall do:$ loginctl terminate-{session|user} ...


I'm trying to trigger the gc in a sane manner in pam_systemd, but first
the patches I sent will close various session and user states races.

And yeh I got the concept of how to deal with sessions that still have
processes running in a background. Perhaps add the logind.conf
"KillUserProcesses" check in the gc, if not set then we prevent the gc
from collecting sessions.

Care must be taken to honor "KillUserProcesses" definition of the
logind.conf man page.

> That said, it's perhaps this check here that actually *prevents* this
> final state from being reached due to GC not being run! So this could
> fix the chicken+egg problem (I don't know the code, so can only really
> talk in generalities!)
Yes, you are speaking about the "if (u->sessions)" in the
user_check_gc(), yes sure it prevents some cases, just think about
sessions that are in the "closing" state without any process in the
background they should be put in the gc, but this check will prevent
this *and* this is also related to sessions that are not put in the gc
cause their scope will always be running and will always be around:

src/login/logind-session.c:session_check_gc()

	if (s->scope && manager_unit_is_active(s->manager, s->scope))


So in order to remove this check we need to make the session state more
stable, fix what I've described in my patches and count on it to perform
the checks in the session_check_gc() function.

And we can't remove the "if (u->sessions)" of the user_check_gc(), it
can be updated to check if all the sessions are in the closing states.

My conclusion is that we need first to make the state of sessions/users
more stable, I've sent patches to try to achieve that.


> 
> 
> >   If so then abort the user_check_gc() or depend on a per-session
> >   function that will perhaps do the "kill-session-processes".
> 
> > * session_check_gc() should check if the session is in the "closing"
> >   state and if there are still running processes in the background, then
> >   decide to abort or force the "kill"session-processes"
> > 
> >   But it should not do the following check:
> >   "if (s->scope && manager_unit_is_active(s->manager, s->scope))"
> > 
> >   Since the scope will always be around, the same was applied in the
> >   commit 63966da86 for user_check_gc(), do not check the user manager!
> 
> 
> Interesting. It seems that 63966da86 suggests my patch above shouldn't
> be needed and in older code, the old "if (u->slice_job ||
> u->service_job)" check should just be dropped.

Hmm I think your patch is needed here since it will check only during
*creation* and *closing* of the slice and service to prevent races during
that time (create service and slice and close slice and service) with the gc!

After the creation and users logs in, that check in your patch will always
be false, it will be a nop, since service_job will be NULL.

Please check:
http://cgit.freedesktop.org/systemd/systemd/tree/src/login/logind-dbus.c#n1966

service_job and slice_job are set to NULL after the jobs have finished.

> 
> > b) Make sure that the state of the session and the user is always
> > correctly set. Not to mention that some logind clients depend on it.
> >  
> > So to make a) we must honor that b) is stable, and after an extensive
> > source code analysis, it seems that the state of users and sessions is
> > not always correct. I'll list some notes/bugs here:
> > 
> > * It seems that in systemd v204 after a logout from a session where
> >   processes are still running like "screen" the state files will show:
> > 
> >   systemd 204
> >   session state file:			user state file:
> >   Active = 1				State = active
> >   State = closing			ACTIVE_SESSIONS = X
> > 
> >   Where for systemd git:
> >   session state file:			user state file:
> >   Active = 0				State = closing
> >   State = closing			ACTIVE_SESSIONS =
> > 
> >   Not to mention that for systemd git, the session and user state files
> >   will show the same even if there are no background processes and they
> >   will not be cleaned  (using getty or ssh logins) which is is the object
> >   of this thread!
> > 
> >   This is a bug, not fixed yet.
> 
> So basically the problem is that the gc is simply not triggered often
> enough to tidy up these old "closing" sessions with no processes left?
Yes!

And if I try to trigger the gc, I always hit situations where sessions
are still active, or during startups which makes logind remove their
state files!

So I started by:
1) Fix session and user state
2) Improve the gc checks to use session and user states
3) Try to push sessions and users in the gc so it can be triggered!


> > * To track the state of sessions the fifo_fd should be used since the
> >   event manager will call session_dispatch_fifo() to remove it on eof.
> > 
> >   Using the session_is_active() *before* the fifo_fd check is not stable
> >   since it will return true before the fifo_fd was even created! and
> >   also before the session scope and user serivce are created, the
> >   session->seat->active will be set earlier before the fifo_fd creation
> >   and will be the last one to be unset after fifo_fd removal.
> > 
> >   And currently this affects src/login/logind-user.c:user_get_state()
> >   logic since it checks session_is_active() before calling
> >   session_get_state() which performs the fifo_fd check before the
> >   session_is_active()
> >  
> >   So user_get_state() and session_get_state() results might differ!
> > 
> >   This is a bug, not fixed yet, user_get_state() should count on
> >   session_get_state().
> > 
> > 
> > * It seems there are other problems with the session and user states,
> >   different variables are used to check the state and later these
> >   variables are updated at different places!
> > 
> >   The following patches will describe the problem and try to make the
> >   state more stable.
> > 
> >   The design was do not add extra flags to 'User' and 'Session' structs
> >   unless this is the only way. The logic is already there, and it needs
> >   a littel synchronization.
> 
> 
> I'll let someone else comment on these last points. They seem sensible,
> but I'm not familiar with the code here so can't really make educated
> comments without a bit more research and am sadly a bit too busy right now.

Yes it took me time to debug this, hope Lennart will have some time and
check it!

> Cheers!
> 
> Col
Thanks Colin!

-- 
Djalal Harouni
http://opendz.org


More information about the systemd-devel mailing list