[PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

Tue Sep 3 07:55:50 UTC 2019

On Fri, Aug 30, 2019 at 09:28:57PM -0700, Tejun Heo wrote:
> Hello,
> 
> I just glanced through the interface and don't have enough context to
> give any kind of detailed review yet.  I'll try to read up and
> understand more and would greatly appreciate if you can give me some
> pointers to read up on the resources being controlled and how the
> actual use cases would look like.  That said, I have some basic
> concerns.
> 
> * TTM vs. GEM distinction seems to be internal implementation detail
>   rather than anything relating to underlying physical resources.
>   Provided that's the case, I'm afraid these internal constructs being
>   used as primary resource control objects likely isn't the right
>   approach.  Whether a given driver uses one or the other internal
>   abstraction layer shouldn't determine how resources are represented
>   at the userland interface layer.

Yeah there's another RFC series from Brian Welty to abstract this away as
a memory region concept for gpus.

> * While breaking up and applying control to different types of
>   internal objects may seem attractive to folks who work day in and
>   day out with the subsystem, they aren't all that useful to users and
>   the siloed controls are likely to make the whole mechanism a lot
>   less useful.  We had the same problem with cgroup1 memcg - putting
>   control of different uses of memory under separate knobs.  It made
>   the whole thing pretty useless.  e.g. if you constrain all knobs
>   tight enough to control the overall usage, overall utilization
>   suffers, but if you don't, you really don't have control over actual
>   usage.  For memcg, what has to be allocated and controlled is
>   physical memory, no matter how they're used.  It's not like you can
>   go buy more "socket" memory.  At least from the looks of it, I'm
>   afraid gpu controller is repeating the same mistakes.

We do have quite a pile of different memories and ranges, so I don't
thinkt we're doing the same mistake here. But it is maybe a bit too
complicated, and exposes stuff that most users really don't care about.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch