[RFC PATCH 0/9] cgroup support for GPU devices

Fri Jan 29 03:00:56 UTC 2021

On 2021/1/27 上午5:46, Brian Welty wrote:

> We'd like to revisit the proposal of a GPU cgroup controller for managing
> GPU devices but with just a basic set of controls.  This series is based on 
> the prior patch series from Kenny Ho [1].  We take Kenny's base patches
> which implement the basic framework for the controller, but we propose an
> alternate set of control files.  Here we've taken a subset of the controls
> proposed in earlier discussion on ML here [2]. 
>
> This series proposes a set of device memory controls (gpu.memory.current,
> gpu.memory.max, and gpu.memory.total) and accounting of GPU time usage
> (gpu.sched.runtime).  GPU time sharing controls are left as future work.
> These are implemented within the GPU controller along with integration/usage
> of the device memory controls by the i915 device driver.
>
> As an accelerator or GPU device is similar in many respects to a CPU with
> (or without) attached system memory, the basic principle here is try to
> copy the semantics of existing controls from other controllers when possible
> and where these controls serve the same underlying purpose.
> For example, the memory.max and memory.current controls are based on
> same controls from MEMCG controller.

It seems not to be DRM specific, or even GPU specific. Would we have an universal
control group for any accelerator, GPGPU device etc, that hold sharable resources
like device memory, compute utility, bandwidth, with extra control file to select
between devices(or vendors)?

e.g. /cgname.device that stores PCI BDF， or enum(intel, amdgpu, nvidia, ...),
defaults to none, means not enabled.