[PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

Tue Sep 3 08:24:30 UTC 2019

Am 03.09.19 um 10:02 schrieb Daniel Vetter:
> On Thu, Aug 29, 2019 at 02:05:17AM -0400, Kenny Ho wrote:
>> This is a follow up to the RFC I made previously to introduce a cgroup
>> controller for the GPU/DRM subsystem [v1,v2,v3].  The goal is to be able to
>> provide resource management to GPU resources using things like container.
>>
>> With this RFC v4, I am hoping to have some consensus on a merge plan.  I believe
>> the GEM related resources (drm.buffer.*) introduced in previous RFC and,
>> hopefully, the logical GPU concept (drm.lgpu.*) introduced in this RFC are
>> uncontroversial and ready to move out of RFC and into a more formal review.  I
>> will continue to work on the memory backend resources (drm.memory.*).
>>
>> The cover letter from v1 is copied below for reference.
>>
>> [v1]: https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
>> [v2]: https://www.spinics.net/lists/cgroups/msg22074.html
>> [v3]: https://lists.freedesktop.org/archives/amd-gfx/2019-June/036026.html
> So looking at all this doesn't seem to have changed much, and the old
> discussion didn't really conclude anywhere (aside from some details).
>
> One more open though that crossed my mind, having read a ton of ttm again
> recently: How does this all interact with ttm global limits? I'd say the
> ttm global limits is the ur-cgroups we have in drm, and not looking at
> that seems kinda bad.

At least my hope was to completely replace ttm globals with those 
limitations here when it is ready.

Christian.

> -Daniel
>
>> v4:
>> Unchanged (no review needed)
>> * drm.memory.*/ttm resources (Patch 9-13, I am still working on memory bandwidth
>> and shrinker)
>> Base on feedbacks on v3:
>> * update nominclature to drmcg
>> * embed per device drmcg properties into drm_device
>> * split GEM buffer related commits into stats and limit
>> * rename function name to align with convention
>> * combined buffer accounting and check into a try_charge function
>> * support buffer stats without limit enforcement
>> * removed GEM buffer sharing limitation
>> * updated documentations
>> New features:
>> * introducing logical GPU concept
>> * example implementation with AMD KFD
>>
>> v3:
>> Base on feedbacks on v2:
>> * removed .help type file from v2
>> * conform to cgroup convention for default and max handling
>> * conform to cgroup convention for addressing device specific limits (with major:minor)
>> New function:
>> * adopted memparse for memory size related attributes
>> * added macro to marshall drmcgrp cftype private  (DRMCG_CTF_PRIV, etc.)
>> * added ttm buffer usage stats (per cgroup, for system, tt, vram.)
>> * added ttm buffer usage limit (per cgroup, for vram.)
>> * added per cgroup bandwidth stats and limiting (burst and average bandwidth)
>>
>> v2:
>> * Removed the vendoring concepts
>> * Add limit to total buffer allocation
>> * Add limit to the maximum size of a buffer allocation
>>
>> v1: cover letter
>>
>> The purpose of this patch series is to start a discussion for a generic cgroup
>> controller for the drm subsystem.  The design proposed here is a very early one.
>> We are hoping to engage the community as we develop the idea.
>>
>>
>> Backgrounds
>> ==========
>> Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
>> tasks, and all their future children, into hierarchical groups with specialized
>> behaviour, such as accounting/limiting the resources which processes in a cgroup
>> can access[1].  Weights, limits, protections, allocations are the main resource
>> distribution models.  Existing cgroup controllers includes cpu, memory, io,
>> rdma, and more.  cgroup is one of the foundational technologies that enables the
>> popular container application deployment and management method.
>>
>> Direct Rendering Manager/drm contains code intended to support the needs of
>> complex graphics devices. Graphics drivers in the kernel may make use of DRM
>> functions to make tasks like memory management, interrupt handling and DMA
>> easier, and provide a uniform interface to applications.  The DRM has also
>> developed beyond traditional graphics applications to support compute/GPGPU
>> applications.
>>
>>
>> Motivations
>> =========
>> As GPU grow beyond the realm of desktop/workstation graphics into areas like
>> data center clusters and IoT, there are increasing needs to monitor and regulate
>> GPU as a resource like cpu, memory and io.
>>
>> Matt Roper from Intel began working on similar idea in early 2018 [2] for the
>> purpose of managing GPU priority using the cgroup hierarchy.  While that
>> particular use case may not warrant a standalone drm cgroup controller, there
>> are other use cases where having one can be useful [3].  Monitoring GPU
>> resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
>> (execution unit [Intel's nomenclature]), GPU job scheduling [4] can help
>> sysadmins get a better understanding of the applications usage profile.  Further
>> usage regulations of the aforementioned resources can also help sysadmins
>> optimize workload deployment on limited GPU resources.
>>
>> With the increased importance of machine learning, data science and other
>> cloud-based applications, GPUs are already in production use in data centers
>> today [5,6,7].  Existing GPU resource management is very course grain, however,
>> as sysadmins are only able to distribute workload on a per-GPU basis [8].  An
>> alternative is to use GPU virtualization (with or without SRIOV) but it
>> generally acts on the entire GPU instead of the specific resources in a GPU.
>> With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
>> resource management (in addition to what may be available via GPU
>> virtualization.)
>>
>> In addition to production use, the DRM cgroup can also help with testing
>> graphics application robustness by providing a mean to artificially limit DRM
>> resources availble to the applications.
>>
>>
>> Challenges
>> ========
>> While there are common infrastructure in DRM that is shared across many vendors
>> (the scheduler [4] for example), there are also aspects of DRM that are vendor
>> specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
>> handle different kinds of cgroup controller.
>>
>> Resources for DRM are also often device (GPU) specific instead of system
>> specific and a system may contain more than one GPU.  For this, we borrowed some
>> of the ideas from RDMA cgroup controller.
>>
>> Approach
>> =======
>> To experiment with the idea of a DRM cgroup, we would like to start with basic
>> accounting and statistics, then continue to iterate and add regulating
>> mechanisms into the driver.
>>
>> [1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
>> [2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
>> [3] https://www.spinics.net/lists/cgroups/msg20720.html
>> [4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
>> [5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
>> [6] https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
>> [7] https://github.com/RadeonOpenCompute/k8s-device-plugin
>> [8] https://github.com/kubernetes/kubernetes/issues/52757
>>
>> Kenny Ho (16):
>>    drm: Add drm_minor_for_each
>>    cgroup: Introduce cgroup for drm subsystem
>>    drm, cgroup: Initialize drmcg properties
>>    drm, cgroup: Add total GEM buffer allocation stats
>>    drm, cgroup: Add peak GEM buffer allocation stats
>>    drm, cgroup: Add GEM buffer allocation count stats
>>    drm, cgroup: Add total GEM buffer allocation limit
>>    drm, cgroup: Add peak GEM buffer allocation limit
>>    drm, cgroup: Add TTM buffer allocation stats
>>    drm, cgroup: Add TTM buffer peak usage stats
>>    drm, cgroup: Add per cgroup bw measure and control
>>    drm, cgroup: Add soft VRAM limit
>>    drm, cgroup: Allow more aggressive memory reclaim
>>    drm, cgroup: Introduce lgpu as DRM cgroup resource
>>    drm, cgroup: add update trigger after limit change
>>    drm/amdgpu: Integrate with DRM cgroup
>>
>>   Documentation/admin-guide/cgroup-v2.rst       |  163 +-
>>   Documentation/cgroup-v1/drm.rst               |    1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |    4 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   29 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |    6 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |    3 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |    6 +
>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |    3 +
>>   .../amd/amdkfd/kfd_process_queue_manager.c    |  140 ++
>>   drivers/gpu/drm/drm_drv.c                     |   26 +
>>   drivers/gpu/drm/drm_gem.c                     |   16 +-
>>   drivers/gpu/drm/drm_internal.h                |    4 -
>>   drivers/gpu/drm/ttm/ttm_bo.c                  |   93 ++
>>   drivers/gpu/drm/ttm/ttm_bo_util.c             |    4 +
>>   include/drm/drm_cgroup.h                      |  122 ++
>>   include/drm/drm_device.h                      |    7 +
>>   include/drm/drm_drv.h                         |   23 +
>>   include/drm/drm_gem.h                         |   13 +-
>>   include/drm/ttm/ttm_bo_api.h                  |    2 +
>>   include/drm/ttm/ttm_bo_driver.h               |   10 +
>>   include/linux/cgroup_drm.h                    |  151 ++
>>   include/linux/cgroup_subsys.h                 |    4 +
>>   init/Kconfig                                  |    5 +
>>   kernel/cgroup/Makefile                        |    1 +
>>   kernel/cgroup/drm.c                           | 1367 +++++++++++++++++
>>   25 files changed, 2193 insertions(+), 10 deletions(-)
>>   create mode 100644 Documentation/cgroup-v1/drm.rst
>>   create mode 100644 include/drm/drm_cgroup.h
>>   create mode 100644 include/linux/cgroup_drm.h
>>   create mode 100644 kernel/cgroup/drm.c
>>
>> -- 
>> 2.22.0
>>