[PATCH RFC v4 14/16] drm, cgroup: Introduce lgpu as DRM cgroup resource
Kenny Ho
y2kenny at gmail.com
Wed Oct 9 15:08:45 UTC 2019
Hi Daniel,
Can you elaborate what you mean in more details? The goal of lgpu is
to provide the ability to subdivide a GPU device and give those slices
to different users as needed. I don't think there is anything
controversial or vendor specific here as requests for this are well
documented. The underlying representation is just a bitmap, which is
neither unprecedented nor vendor specific (bitmap is used in cpuset
for instance.)
An implementation of this abstraction is not hardware specific either.
For example, one can associate a virtual function in SRIOV as a lgpu.
Alternatively, a device can also declare to have 100 lgpus and treat
the lgpu quantity as a percentage representation of GPU subdivision.
The fact that an abstraction works well with a vendor implementation
does not make it a "prettification" of a vendor feature (by this
logic, I hope you are not implying an abstraction is only valid if it
does not work with amd CU masking because that seems fairly partisan.)
Did I misread your characterization of this patch?
Regards,
Kenny
On Wed, Oct 9, 2019 at 6:31 AM Daniel Vetter <daniel at ffwll.ch> wrote:
>
> On Tue, Oct 08, 2019 at 06:53:18PM +0000, Kuehling, Felix wrote:
> > On 2019-08-29 2:05 a.m., Kenny Ho wrote:
> > > drm.lgpu
> > > A read-write nested-keyed file which exists on all cgroups.
> > > Each entry is keyed by the DRM device's major:minor.
> > >
> > > lgpu stands for logical GPU, it is an abstraction used to
> > > subdivide a physical DRM device for the purpose of resource
> > > management.
> > >
> > > The lgpu is a discrete quantity that is device specific (i.e.
> > > some DRM devices may have 64 lgpus while others may have 100
> > > lgpus.) The lgpu is a single quantity with two representations
> > > denoted by the following nested keys.
> > >
> > > ===== ========================================
> > > count Representing lgpu as anonymous resource
> > > list Representing lgpu as named resource
> > > ===== ========================================
> > >
> > > For example:
> > > 226:0 count=256 list=0-255
> > > 226:1 count=4 list=0,2,4,6
> > > 226:2 count=32 list=32-63
> > >
> > > lgpu is represented by a bitmap and uses the bitmap_parselist
> > > kernel function so the list key input format is a
> > > comma-separated list of decimal numbers and ranges.
> > >
> > > Consecutively set bits are shown as two hyphen-separated decimal
> > > numbers, the smallest and largest bit numbers set in the range.
> > > Optionally each range can be postfixed to denote that only parts
> > > of it should be set. The range will divided to groups of
> > > specific size.
> > > Syntax: range:used_size/group_size
> > > Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> > >
> > > The count key is the hamming weight / hweight of the bitmap.
> > >
> > > Both count and list accept the max and default keywords.
> > >
> > > Some DRM devices may only support lgpu as anonymous resources.
> > > In such case, the significance of the position of the set bits
> > > in list will be ignored.
> > >
> > > This lgpu resource supports the 'allocation' resource
> > > distribution model.
> > >
> > > Change-Id: I1afcacf356770930c7f925df043e51ad06ceb98e
> > > Signed-off-by: Kenny Ho <Kenny.Ho at amd.com>
> >
> > The description sounds reasonable to me and maps well to the CU masking
> > feature in our GPUs.
> >
> > It would also allow us to do more coarse-grained masking for example to
> > guarantee balanced allocation of CUs across shader engines or
> > partitioning of memory bandwidth or CP pipes (if that is supported by
> > the hardware/firmware).
>
> Hm, so this sounds like the definition for how this cgroup is supposed to
> work is "amd CU masking" (whatever that exactly is). And the abstract
> description is just prettification on top, but not actually the real
> definition you guys want.
>
> I think adding a cgroup which is that much depending upon the hw
> implementation of the first driver supporting it is not a good idea.
> -Daniel
>
> >
> > I can't comment on the code as I'm unfamiliar with the details of the
> > cgroup code.
> >
> > Acked-by: Felix Kuehling <Felix.Kuehling at amd.com>
> >
> >
> > > ---
> > > Documentation/admin-guide/cgroup-v2.rst | 46 ++++++++
> > > include/drm/drm_cgroup.h | 4 +
> > > include/linux/cgroup_drm.h | 6 ++
> > > kernel/cgroup/drm.c | 135 ++++++++++++++++++++++++
> > > 4 files changed, 191 insertions(+)
> > >
> > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > > index 87a195133eaa..57f18469bd76 100644
> > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > @@ -1958,6 +1958,52 @@ DRM Interface Files
> > > Set largest allocation for /dev/dri/card1 to 4MB
> > > echo "226:1 4m" > drm.buffer.peak.max
> > >
> > > + drm.lgpu
> > > + A read-write nested-keyed file which exists on all cgroups.
> > > + Each entry is keyed by the DRM device's major:minor.
> > > +
> > > + lgpu stands for logical GPU, it is an abstraction used to
> > > + subdivide a physical DRM device for the purpose of resource
> > > + management.
> > > +
> > > + The lgpu is a discrete quantity that is device specific (i.e.
> > > + some DRM devices may have 64 lgpus while others may have 100
> > > + lgpus.) The lgpu is a single quantity with two representations
> > > + denoted by the following nested keys.
> > > +
> > > + ===== ========================================
> > > + count Representing lgpu as anonymous resource
> > > + list Representing lgpu as named resource
> > > + ===== ========================================
> > > +
> > > + For example:
> > > + 226:0 count=256 list=0-255
> > > + 226:1 count=4 list=0,2,4,6
> > > + 226:2 count=32 list=32-63
> > > +
> > > + lgpu is represented by a bitmap and uses the bitmap_parselist
> > > + kernel function so the list key input format is a
> > > + comma-separated list of decimal numbers and ranges.
> > > +
> > > + Consecutively set bits are shown as two hyphen-separated decimal
> > > + numbers, the smallest and largest bit numbers set in the range.
> > > + Optionally each range can be postfixed to denote that only parts
> > > + of it should be set. The range will divided to groups of
> > > + specific size.
> > > + Syntax: range:used_size/group_size
> > > + Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> > > +
> > > + The count key is the hamming weight / hweight of the bitmap.
> > > +
> > > + Both count and list accept the max and default keywords.
> > > +
> > > + Some DRM devices may only support lgpu as anonymous resources.
> > > + In such case, the significance of the position of the set bits
> > > + in list will be ignored.
> > > +
> > > + This lgpu resource supports the 'allocation' resource
> > > + distribution model.
> > > +
> > > GEM Buffer Ownership
> > > ~~~~~~~~~~~~~~~~~~~~
> > >
> > > diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> > > index 6d9707e1eb72..a8d6be0b075b 100644
> > > --- a/include/drm/drm_cgroup.h
> > > +++ b/include/drm/drm_cgroup.h
> > > @@ -6,6 +6,7 @@
> > >
> > > #include <linux/cgroup_drm.h>
> > > #include <linux/workqueue.h>
> > > +#include <linux/types.h>
> > > #include <drm/ttm/ttm_bo_api.h>
> > > #include <drm/ttm/ttm_bo_driver.h>
> > >
> > > @@ -28,6 +29,9 @@ struct drmcg_props {
> > > s64 mem_highs_default[TTM_PL_PRIV+1];
> > >
> > > struct work_struct *mem_reclaim_wq[TTM_PL_PRIV];
> > > +
> > > + int lgpu_capacity;
> > > + DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> > > };
> > >
> > > #ifdef CONFIG_CGROUP_DRM
> > > diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> > > index c56cfe74d1a6..7b1cfc4ce4c3 100644
> > > --- a/include/linux/cgroup_drm.h
> > > +++ b/include/linux/cgroup_drm.h
> > > @@ -14,6 +14,8 @@
> > > /* limit defined per the way drm_minor_alloc operates */
> > > #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
> > >
> > > +#define MAX_DRMCG_LGPU_CAPACITY 256
> > > +
> > > enum drmcg_mem_bw_attr {
> > > DRMCG_MEM_BW_ATTR_BYTE_MOVED, /* for calulating 'instantaneous' bw */
> > > DRMCG_MEM_BW_ATTR_ACCUM_US, /* for calulating 'instantaneous' bw */
> > > @@ -32,6 +34,7 @@ enum drmcg_res_type {
> > > DRMCG_TYPE_MEM_PEAK,
> > > DRMCG_TYPE_BANDWIDTH,
> > > DRMCG_TYPE_BANDWIDTH_PERIOD_BURST,
> > > + DRMCG_TYPE_LGPU,
> > > __DRMCG_TYPE_LAST,
> > > };
> > >
> > > @@ -58,6 +61,9 @@ struct drmcg_device_resource {
> > > s64 mem_bw_stats[__DRMCG_MEM_BW_ATTR_LAST];
> > > s64 mem_bw_limits_bytes_in_period;
> > > s64 mem_bw_limits_avg_bytes_per_us;
> > > +
> > > + s64 lgpu_used;
> > > + DECLARE_BITMAP(lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> > > };
> > >
> > > /**
> > > diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> > > index 0ea7f0619e25..18c4368e2c29 100644
> > > --- a/kernel/cgroup/drm.c
> > > +++ b/kernel/cgroup/drm.c
> > > @@ -9,6 +9,7 @@
> > > #include <linux/cgroup_drm.h>
> > > #include <linux/ktime.h>
> > > #include <linux/kernel.h>
> > > +#include <linux/bitmap.h>
> > > #include <drm/drm_file.h>
> > > #include <drm/drm_drv.h>
> > > #include <drm/ttm/ttm_bo_api.h>
> > > @@ -52,6 +53,9 @@ static char const *mem_bw_attr_names[] = {
> > > #define MEM_BW_LIMITS_NAME_AVG "avg_bytes_per_us"
> > > #define MEM_BW_LIMITS_NAME_BURST "bytes_in_period"
> > >
> > > +#define LGPU_LIMITS_NAME_LIST "list"
> > > +#define LGPU_LIMITS_NAME_COUNT "count"
> > > +
> > > static struct drmcg *root_drmcg __read_mostly;
> > >
> > > static int drmcg_css_free_fn(int id, void *ptr, void *data)
> > > @@ -115,6 +119,10 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
> > > for (i = 0; i <= TTM_PL_PRIV; i++)
> > > ddr->mem_highs[i] = dev->drmcg_props.mem_highs_default[i];
> > >
> > > + bitmap_copy(ddr->lgpu_allocated, dev->drmcg_props.lgpu_slots,
> > > + MAX_DRMCG_LGPU_CAPACITY);
> > > + ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> > > +
> > > mutex_unlock(&dev->drmcg_mutex);
> > > return 0;
> > > }
> > > @@ -280,6 +288,14 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> > > MEM_BW_LIMITS_NAME_AVG,
> > > ddr->mem_bw_limits_avg_bytes_per_us);
> > > break;
> > > + case DRMCG_TYPE_LGPU:
> > > + seq_printf(sf, "%s=%lld %s=%*pbl\n",
> > > + LGPU_LIMITS_NAME_COUNT,
> > > + ddr->lgpu_used,
> > > + LGPU_LIMITS_NAME_LIST,
> > > + dev->drmcg_props.lgpu_capacity,
> > > + ddr->lgpu_allocated);
> > > + break;
> > > default:
> > > seq_puts(sf, "\n");
> > > break;
> > > @@ -314,6 +330,15 @@ static void drmcg_print_default(struct drmcg_props *props,
> > > MEM_BW_LIMITS_NAME_AVG,
> > > props->mem_bw_avg_bytes_per_us_default);
> > > break;
> > > + case DRMCG_TYPE_LGPU:
> > > + seq_printf(sf, "%s=%d %s=%*pbl\n",
> > > + LGPU_LIMITS_NAME_COUNT,
> > > + bitmap_weight(props->lgpu_slots,
> > > + props->lgpu_capacity),
> > > + LGPU_LIMITS_NAME_LIST,
> > > + props->lgpu_capacity,
> > > + props->lgpu_slots);
> > > + break;
> > > default:
> > > seq_puts(sf, "\n");
> > > break;
> > > @@ -407,9 +432,21 @@ static void drmcg_value_apply(struct drm_device *dev, s64 *dst, s64 val)
> > > mutex_unlock(&dev->drmcg_mutex);
> > > }
> > >
> > > +static void drmcg_lgpu_values_apply(struct drm_device *dev,
> > > + struct drmcg_device_resource *ddr, unsigned long *val)
> > > +{
> > > +
> > > + mutex_lock(&dev->drmcg_mutex);
> > > + bitmap_copy(ddr->lgpu_allocated, val, MAX_DRMCG_LGPU_CAPACITY);
> > > + ddr->lgpu_used = bitmap_weight(ddr->lgpu_allocated, MAX_DRMCG_LGPU_CAPACITY);
> > > + mutex_unlock(&dev->drmcg_mutex);
> > > +}
> > > +
> > > static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> > > struct drm_device *dev, char *attrs)
> > > {
> > > + DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> > > + DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> > > enum drmcg_res_type type =
> > > DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
> > > struct drmcg *drmcg = css_to_drmcg(of_css(of));
> > > @@ -501,6 +538,83 @@ static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> > > continue;
> > > }
> > > break; /* DRMCG_TYPE_MEM */
> > > + case DRMCG_TYPE_LGPU:
> > > + if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
> > > + strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) )
> > > + continue;
> > > +
> > > + if (!strcmp("max", sval) ||
> > > + !strcmp("default", sval)) {
> > > + if (parent != NULL)
> > > + drmcg_lgpu_values_apply(dev, ddr,
> > > + parent->dev_resources[minor]->
> > > + lgpu_allocated);
> > > + else
> > > + drmcg_lgpu_values_apply(dev, ddr,
> > > + props->lgpu_slots);
> > > +
> > > + continue;
> > > + }
> > > +
> > > + if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
> > > + p_max = parent == NULL ? props->lgpu_capacity:
> > > + bitmap_weight(
> > > + parent->dev_resources[minor]->
> > > + lgpu_allocated, props->lgpu_capacity);
> > > +
> > > + rc = drmcg_process_limit_s64_val(sval,
> > > + false, p_max, p_max, &val);
> > > +
> > > + if (rc || val < 0) {
> > > + drmcg_pr_cft_err(drmcg, rc, cft_name,
> > > + minor);
> > > + continue;
> > > + }
> > > +
> > > + bitmap_zero(tmp_bitmap,
> > > + MAX_DRMCG_LGPU_CAPACITY);
> > > + bitmap_set(tmp_bitmap, 0, val);
> > > + }
> > > +
> > > + if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
> > > + rc = bitmap_parselist(sval, tmp_bitmap,
> > > + MAX_DRMCG_LGPU_CAPACITY);
> > > +
> > > + if (rc) {
> > > + drmcg_pr_cft_err(drmcg, rc, cft_name,
> > > + minor);
> > > + continue;
> > > + }
> > > +
> > > + bitmap_andnot(chk_bitmap, tmp_bitmap,
> > > + props->lgpu_slots,
> > > + MAX_DRMCG_LGPU_CAPACITY);
> > > +
> > > + if (!bitmap_empty(chk_bitmap,
> > > + MAX_DRMCG_LGPU_CAPACITY)) {
> > > + drmcg_pr_cft_err(drmcg, 0, cft_name,
> > > + minor);
> > > + continue;
> > > + }
> > > + }
> > > +
> > > +
> > > + if (parent != NULL) {
> > > + bitmap_and(chk_bitmap, tmp_bitmap,
> > > + parent->dev_resources[minor]->lgpu_allocated,
> > > + props->lgpu_capacity);
> > > +
> > > + if (bitmap_empty(chk_bitmap,
> > > + props->lgpu_capacity)) {
> > > + drmcg_pr_cft_err(drmcg, 0,
> > > + cft_name, minor);
> > > + continue;
> > > + }
> > > + }
> > > +
> > > + drmcg_lgpu_values_apply(dev, ddr, tmp_bitmap);
> > > +
> > > + break; /* DRMCG_TYPE_LGPU */
> > > default:
> > > break;
> > > } /* switch (type) */
> > > @@ -606,6 +720,7 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> > > break;
> > > case DRMCG_TYPE_BANDWIDTH:
> > > case DRMCG_TYPE_MEM:
> > > + case DRMCG_TYPE_LGPU:
> > > drmcg_nested_limit_parse(of, dm->dev, sattr);
> > > break;
> > > default:
> > > @@ -731,6 +846,20 @@ struct cftype files[] = {
> > > .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BANDWIDTH,
> > > DRMCG_FTYPE_DEFAULT),
> > > },
> > > + {
> > > + .name = "lgpu",
> > > + .seq_show = drmcg_seq_show,
> > > + .write = drmcg_limit_write,
> > > + .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> > > + DRMCG_FTYPE_LIMIT),
> > > + },
> > > + {
> > > + .name = "lgpu.default",
> > > + .seq_show = drmcg_seq_show,
> > > + .flags = CFTYPE_ONLY_ON_ROOT,
> > > + .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> > > + DRMCG_FTYPE_DEFAULT),
> > > + },
> > > { } /* terminate */
> > > };
> > >
> > > @@ -744,6 +873,10 @@ struct cgroup_subsys drm_cgrp_subsys = {
> > >
> > > static inline void drmcg_update_cg_tree(struct drm_device *dev)
> > > {
> > > + bitmap_zero(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> > > + bitmap_fill(dev->drmcg_props.lgpu_slots,
> > > + dev->drmcg_props.lgpu_capacity);
> > > +
> > > /* init cgroups created before registration (i.e. root cgroup) */
> > > if (root_drmcg != NULL) {
> > > struct cgroup_subsys_state *pos;
> > > @@ -800,6 +933,8 @@ void drmcg_device_early_init(struct drm_device *dev)
> > > for (i = 0; i <= TTM_PL_PRIV; i++)
> > > dev->drmcg_props.mem_highs_default[i] = S64_MAX;
> > >
> > > + dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
> > > +
> > > drmcg_update_cg_tree(dev);
> > > }
> > > EXPORT_SYMBOL(drmcg_device_early_init);
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
More information about the dri-devel
mailing list