[RFC PATCH 0/9] cgroup support for GPU devices
brian.welty at intel.com
Mon Feb 1 23:21:35 UTC 2021
On 1/28/2021 7:00 PM, Xingyou Chen wrote:
> On 2021/1/27 上午5:46, Brian Welty wrote:
>> We'd like to revisit the proposal of a GPU cgroup controller for managing
>> GPU devices but with just a basic set of controls. This series is based on
>> the prior patch series from Kenny Ho . We take Kenny's base patches
>> which implement the basic framework for the controller, but we propose an
>> alternate set of control files. Here we've taken a subset of the controls
>> proposed in earlier discussion on ML here .
>> This series proposes a set of device memory controls (gpu.memory.current,
>> gpu.memory.max, and gpu.memory.total) and accounting of GPU time usage
>> (gpu.sched.runtime). GPU time sharing controls are left as future work.
>> These are implemented within the GPU controller along with integration/usage
>> of the device memory controls by the i915 device driver.
>> As an accelerator or GPU device is similar in many respects to a CPU with
>> (or without) attached system memory, the basic principle here is try to
>> copy the semantics of existing controls from other controllers when possible
>> and where these controls serve the same underlying purpose.
>> For example, the memory.max and memory.current controls are based on
>> same controls from MEMCG controller.
> It seems not to be DRM specific, or even GPU specific. Would we have an universal
> control group for any accelerator, GPGPU device etc, that hold sharable resources
> like device memory, compute utility, bandwidth, with extra control file to select
> between devices(or vendors)?
> e.g. /cgname.device that stores PCI BDF， or enum(intel, amdgpu, nvidia, ...),
> defaults to none, means not enabled.
Hi, thanks for the feedback. Yes, I tend to agree. I've asked about this in
earlier work; my suggestion is to name the controller something like 'XPU' to
be clear that these controls could apply to more than GPU.
But at least for now, based on Tejun's reply , the feedback is to try and keep
this controller as small and focused as possible on just GPU. At least until
we get some consensus on set of controls for GPU..... but for this we need more
active input from community......
More information about the amd-gfx