[Intel-gfx] [RFC PATCH 0/5] cgroup support for GPU devices

Johannes Weiner hannes at cmpxchg.org
Mon May 6 15:16:13 UTC 2019


On Wed, May 01, 2019 at 10:04:33AM -0400, Brian Welty wrote:
> In containerized or virtualized environments, there is desire to have
> controls in place for resources that can be consumed by users of a GPU
> device.  This RFC patch series proposes a framework for integrating 
> use of existing cgroup controllers into device drivers.
> The i915 driver is updated in this series as our primary use case to
> leverage this framework and to serve as an example for discussion.
> 
> The patch series enables device drivers to use cgroups to control the
> following resources within a GPU (or other accelerator device):
> *  control allocation of device memory (reuse of memcg)
> and with future work, we could extend to:
> *  track and control share of GPU time (reuse of cpu/cpuacct)
> *  apply mask of allowed execution engines (reuse of cpusets)

Please create a separate controller for your purposes.

The memory controller is for traditional RAM. I don't see it having
much in common with what you're trying to do, and it's barely reusing
any of the memcg code. You can use the page_counter API directly.

> Instead of introducing a new cgroup subsystem for GPU devices, a new
> framework is proposed to allow devices to register with existing cgroup
> controllers, which creates per-device cgroup_subsys_state within the
> cgroup.  This gives device drivers their own private cgroup controls
> (such as memory limits or other parameters) to be applied to device
> resources instead of host system resources.
> Device drivers (GPU or other) are then able to reuse the existing cgroup
> controls, instead of inventing similar ones.
> 
> Per-device controls would be exposed in cgroup filesystem as:
>     mount/<cgroup_name>/<subsys_name>.devices/<dev_name>/<subsys_files>
> such as (for example):
>     mount/<cgroup_name>/memory.devices/<dev_name>/memory.max
>     mount/<cgroup_name>/memory.devices/<dev_name>/memory.current
>     mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.stat
>     mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.weight

Subdirectories for anything other than actual cgroups are a no-go. If
you need a hierarchy, use dotted filenames:

gpu.memory.max
gpu.cycles.max

etc. and look at Documentation/admin-guide/cgroup-v2.rst's 'Format'
and 'Conventions', as well as how the io controller works, to see how
multi-key / multi-device control files are implemented in cgroup2.


More information about the Intel-gfx mailing list