[PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

Kenny Ho y2kenny at gmail.com
Tue Sep 3 19:23:17 UTC 2019

Hi Tejun,

Thanks for looking into this.  I can definitely help where I can and I
am sure other experts will jump in if I start misrepresenting the
reality :) (as Daniel already have done.)

Regarding your points, my understanding is that there isn't really a
TTM vs GEM situation anymore (there is an lwn.net article about that,
but it is more than a decade old.)  I believe GEM is the common
interface at this point and more and more features are being
refactored into it.  For example, AMD's driver uses TTM internally but
things are exposed via the GEM interface.

This GEM resource is actually the single number resource you just
referred to.  A GEM buffer (the drm.buffer.* resources) can be backed
by VRAM, or system memory or other type of memory.  The more fine
grain control is the drm.memory.* resources which still need more
discussion.  (As some of the functionalities in TTM are being
refactored into the GEM level.  I have seen some patches that make TTM
a subclass of GEM.)

This RFC can be grouped into 3 areas and they are fairly independent
so they can be reviewed separately: high level device memory control
(buffer.*), fine grain memory control and bandwidth (memory.*) and
compute resources (lgpu.*)  I think the memory.* resources are the
most controversial part but I think it's still needed.

Perhaps an analogy may help.  For a system, we have CPUs and memory.
And within memory, it can be backed by RAM or swap.  For GPU, each
device can have LGPUs and buffers.  And within the buffers, it can be
backed by VRAM, or system RAM or even swap.

As for setting the right amount, I think that's where the profiling
aspect of the *.stats comes in.  And while one can't necessary buy
more VRAM, it is still a useful knob to adjust if the intention is to
pack more work into a GPU device with predictable performance.  This
research on various GPU workload may be of interest:

A Taxonomy of GPGPU Performance Scaling

(summary: GPU workload can be memory bound or compute bound.  So it's
possible to pack different workload together to improve utilization.)


On Tue, Sep 3, 2019 at 2:50 PM Tejun Heo <tj at kernel.org> wrote:
> Hello, Daniel.
> On Tue, Sep 03, 2019 at 09:55:50AM +0200, Daniel Vetter wrote:
> > > * While breaking up and applying control to different types of
> > >   internal objects may seem attractive to folks who work day in and
> > >   day out with the subsystem, they aren't all that useful to users and
> > >   the siloed controls are likely to make the whole mechanism a lot
> > >   less useful.  We had the same problem with cgroup1 memcg - putting
> > >   control of different uses of memory under separate knobs.  It made
> > >   the whole thing pretty useless.  e.g. if you constrain all knobs
> > >   tight enough to control the overall usage, overall utilization
> > >   suffers, but if you don't, you really don't have control over actual
> > >   usage.  For memcg, what has to be allocated and controlled is
> > >   physical memory, no matter how they're used.  It's not like you can
> > >   go buy more "socket" memory.  At least from the looks of it, I'm
> > >   afraid gpu controller is repeating the same mistakes.
> >
> > We do have quite a pile of different memories and ranges, so I don't
> > thinkt we're doing the same mistake here. But it is maybe a bit too
> I see.  One thing which caught my eyes was the system memory control.
> Shouldn't that be controlled by memcg?  Is there something special
> about system memory used by gpus?
> > complicated, and exposes stuff that most users really don't care about.
> Could be from me not knowing much about gpus but definitely looks too
> complex to me.  I don't see how users would be able to alloate, vram,
> system memory and GART with reasonable accuracy.  memcg on cgroup2
> deals with just single number and that's already plenty challenging.
> Thanks.
> --
> tejun

More information about the dri-devel mailing list