<div dir="ltr"><div dir="ltr"></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 15, 2022 at 5:24 PM Michal Koutný <<a href="mailto:mkoutny@suse.com">mkoutny@suse.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Tue, Mar 15, 2022 at 04:35:12PM +0100, Felip Moll <<a href="mailto:felip@schedmd.com" target="_blank">felip@schedmd.com</a>> wrote:<br> > Meaning that it would be great to have a delegated cgroup subtree without<br> > the need of a service or scope.<br> > Just an empty subtree.<br> <br> It looks appealing to add Delegate= directive to slice units.<br> Firstly, that'd prevent the use of the slice by anything systemd.<br> Then some notion of owner of that subtree would have to be defined (if<br> only for cleanup).<br> That owner would be a process -- bang, you created a service with<br> delegation or a scope with "keepalive" process.<br> <br></blockquote><div><br></div><div>Correct, this is how the current systemd design works.</div><div>But... what if the concept of owner was irrelevant? What if we could just tell systemd, hey, give me /sys/fs/cgroup/mysubdir and never ever touch it or do anything to it or pids residing into it.<br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> (The above is slightly misleading) there could be an alternative of<br> something like RemainAfterExit=yes for scopes, i.e. such scopes would<br> not be stopped after last process exiting (but systemd would still be in<br> charge of cleaning the cgroup after explicit stop request and that'd<br> also mark the scope as truly stopped).<br> Such a recycled scope would only be useful via<br> org.freedesktop.systemd1.Manager.AttachProcessesToUnit().<br> <br></blockquote><div><br></div><div>This is also a good idea.<br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> BTW I'm also wondering how do you detect a job finishing in the case<br> original parent is gone (due to main service restart) and job's main<br> process reparented?<br> <br></blockquote><div><br></div><div>slurmstepd connects to slurmd through socket and sends an RPC.</div><div>If slurmd is gone, slurmstepd (child) will retry the RPC and remain until slurmd appears again and responds.<br></div><div><br></div><div>The main process doesn't wait for their child, but instead we do a double fork to make the child be parented by init process 1.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> BTW 2 You didn't like having a scope for each job. Is it because of the<br> setup time (IOW jobs are short-lived) or persistent scopes overhead (too<br> many units, PID1 scalability)?<br></blockquote><div><br></div><div>It is not that I didn't like it. It is that I observed a delay in step creation (fork slurmstepd) because sending an async dbus message required the stepd to wait for the systemd job to be executed, and it can take time; computationally a lot more than just a mkdir on the cgroup subtree. Just to put an example, a 'srun hostname' command starts a job which runs a hostname. Response is instantaneous with mkdir's but it takes almost 1 second with a call to systemd through dbus. Slurm is used for HPC, but also for HTC (High Throughput Computing), which means hundreds of jobs can be started in a short period of time, so yes, this delay is critical, and not only because jobs can be short-lived, but there can be a massive job finish + job start at the same time. I just ran one test of our regression and 'systemctl list-unit-files' responsiveness was compromised. Also from the point of view of a sysadmin this was not ideal, so as you say scalability of PID1 is also a concern.</div><div><br></div><div>This is the reason I will not be using 1 scope per job, and I prefer the other solution to have 1 single scope with Delegate=yes.</div><div><br></div><div>Does it make sense?<br></div></div></div>