[RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

Mon May 13 15:10:22 UTC 2019

On Fri, May 10, 2019 at 02:50:39PM -0400, Kenny Ho wrote:
> On Fri, May 10, 2019 at 1:48 PM Koenig, Christian
> <Christian.Koenig at amd.com> wrote:
> > Well another question is why do we want to prevent that in the first place?
> >
> > I mean the worst thing that can happen is that we account a BO multiple
> > times.
> That's one of the problems.  The other one is the BO outliving the
> lifetime of a cgroup and there's no good way to un-charge the usage
> when the BO is free so the count won't be accurate.
> 
> I have looked into two possible solutions.  One is to prevent cgroup
> from being removed when there are BOs owned by the cgroup still alive
> (similar to how cgroup removal will fail if it still has processes
> attached to it.)  My concern here is the possibility of not able to
> remove a cgroup forever due to the lifetime of a BO (continuously
> being shared and reuse and never die.)  Perhaps you can shed some
> light on this possibility.
> 
> The other one is to keep track of all the buffers and migrate them to
> the parent if a cgroup is closed.  My concern here is the performance
> overhead on tracking all the buffers.

My understanding is that other cgroups already use reference counting to
make sure the data structure in the kernel doesn't disappear too early. So
you can delete the cgroup, but it might not get freed completely until all
the BO allocated from that cgroup are released. There's a recent lwn
article on how that's not all that awesome for the memory cgroup
controller, and what to do about it:

https://lwn.net/Articles/787614/

We probably want to align with whatever the mem cgroup folks come up with
(so _not_ prevent deletion of the cgroup, since that's different
behaviour).

> > And going into the same direction where is the code to handle an open
> > device file descriptor which is send from one cgroup to another?
> I looked into this before but I forgot what I found.  Perhaps folks
> familiar with device cgroup can chime in.
> 
> Actually, just did another quick search right now.  Looks like the
> access is enforced at the inode level (__devcgroup_check_permission)
> so the fd sent to another cgroup that does not have access to the
> device should still not have access.

That's the device cgroup, not the memory accounting stuff.

Imo for memory allocations we should look at what happens when you pass a
tempfs file around to another cgroup and then extend it there. I think
those allocations are charged against the cgroup which actually allocates
stuff.

So for drm, if you pass around a device fd, then we always charge ioctl
calls to create a BO against the process doing the ioctl call, not against
the process which originally opened the device fd. For e.g. DRI3 that's
actually the only reasonable thing to do, since otherwise we'd charge
everything against the Xserver.
-Daniel

> 
> Regards,
> Kenny
> 
> 
> > Regards,
> > Christian.
> >
> > >
> > > Regards,
> > > Kenny
> > >
> > >>> On the other hand, if there are expectations for resource management
> > >>> between containers, I would like to know who is the expected manager
> > >>> and how does it fit into the concept of container (which enforce some
> > >>> level of isolation.)  One possible manager may be the display server.
> > >>> But as long as the display server is in a parent cgroup of the apps'
> > >>> cgroup, the apps can still import handles from the display server
> > >>> under the current implementation.  My understanding is that this is
> > >>> most likely the case, with the display server simply sitting at the
> > >>> default/root cgroup.  But I certainly want to hear more about other
> > >>> use cases (for example, is running multiple display servers on a
> > >>> single host a realistic possibility?  Are there people running
> > >>> multiple display servers inside peer containers?  If so, how do they
> > >>> coordinate resources?)
> > >> We definitely have situations with multiple display servers running
> > >> (just think of VR).
> > >>
> > >> I just can't say if they currently use cgroups in any way.
> > >>
> > >> Thanks,
> > >> Christian.
> > >>
> > >>> I should probably summarize some of these into the commit message.
> > >>>
> > >>> Regards,
> > >>> Kenny
> > >>>
> > >>>
> > >>>
> > >>>> Christian.
> > >>>>
> >
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch