[systemd-devel] Systemd, cgrupsv2, cgrulesengd, and nftables

Andrei Borzenkov arvidjaar at gmail.com
Mon Jun 17 17:30:12 UTC 2024


On 17.06.2024 18:20, Michal Koutný wrote:
> Hello.
> 
> On Sat, Jun 15, 2024 at 04:49:33PM GMT, Andrei Borzenkov <arvidjaar at gmail.com> wrote:
>> ...
>> Which does not really solve the problem. So, once again:
>>
>> - nftables allow filtering based on cgroupv2 path
>> - cgroupv2 path is resolved at the time rule is processed. It is impossible
>> to configure rule for a future cgroup
> 
> Can nftables accept non-leaf cgroup? (Of a .slice unit)
> 

To my best knowledge it does not care. cgroup is cgroup.

>> So, no mantra about one ring to rule them all is going to help here as long
>> as none of the following is possible
>>
>> - systemd (which puts processes in cgroups) will also add corresponding
>> nftables rule that refers to this new transient cgroup
> 
> I think systemd comes with its own filtering based on BPF (see
> systemd.resource-control(5), "Network Accounting and Control") or see
> NFTSet= in the same section, does that solve the issue?
> 

Yes, NFTSet effectively solves this for the system services (instead of 
matching for a literal cgroup you use map and systemd dynamically adds 
elements to this map). But it requires root and is not available for 
user services.

> 
>> - or-
>>
>> - systemd allows pre-creation of cgroups and *atomic* placement of processes
>> in them
> 
> systemd places process either via clone-migrate-exec or
> clone(CLONE_INTO_CGROUP) idioms, so the newly exec'd process starts in
> the desired cgroup.
> 
> This is utilized with the .slice unit above (but it must be "pinned"
> into existence with some sibling unit).
> 

This may be a workaround, at least for some use cases. I am not sure if 
"slice" really fits here. Slice is about partitioning resources, there 
is no obvious reason why the same slice cannot contain programs allowed 
to access network and programs blocked from network.

One more consideration (comparing with solutions like cgrulesengd) is 
who enforces restrictions. cgrulesengd configuration is managed by the 
administrator and user has no control over it, while here user is free 
to place any program in any slice under own hierarchy and cannot place 
program in any slice outside of it.

> (Migrating already running processes with their runtime state is nothing
> I'd recommend.)
> 

I did not mean it.


More information about the systemd-devel mailing list