[PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem
daniel at ffwll.ch
Tue Sep 17 12:21:40 UTC 2019
On Tue, Sep 10, 2019 at 01:54:48PM +0200, Michal Hocko wrote:
> On Fri 06-09-19 08:45:39, Tejun Heo wrote:
> > Hello, Daniel.
> > On Fri, Sep 06, 2019 at 05:34:16PM +0200, Daniel Vetter wrote:
> > > > Hmm... what'd be the fundamental difference from slab or socket memory
> > > > which are handled through memcg? Is system memory used by GPUs have
> > > > further global restrictions in addition to the amount of physical
> > > > memory used?
> > >
> > > Sometimes, but that would be specific resources (kinda like vram),
> > > e.g. CMA regions used by a gpu. But probably not something you'll run
> > > in a datacenter and want cgroups for ...
> > >
> > > I guess we could try to integrate with the memcg group controller. One
> > > trouble is that aside from i915 most gpu drivers do not really have a
> > > full shrinker, so not sure how that would all integrate.
> > So, while it'd great to have shrinkers in the longer term, it's not a
> > strict requirement to be accounted in memcg. It already accounts a
> > lot of memory which isn't reclaimable (a lot of slabs and socket
> > buffer).
> Yeah, having a shrinker is preferred but the memory should better be
> reclaimable in some form. If not by any other means then at least bound
> to a user process context so that it goes away with a task being killed
> by the OOM killer. If that is not the case then we cannot really charge
> it because then the memcg controller is of no use. We can tolerate it to
> some degree if the amount of memory charged like that is negligible to
> the overall size. But from the discussion it seems that these buffers
> are really large.
I think we can just make "must have a shrinker" as a requirement for
system memory cgroup thing for gpu buffers. There's mild locking inversion
fun to be had when typing one, but I think the problem is well-understood
enough that this isn't a huge hurdle to climb over. And should give admins
an easier to mange system, since it works more like what they know
Software Engineer, Intel Corporation
More information about the dri-devel