[RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

Fri May 10 18:50:39 UTC 2019

On Fri, May 10, 2019 at 1:48 PM Koenig, Christian
<Christian.Koenig at amd.com> wrote:
> Well another question is why do we want to prevent that in the first place?
>
> I mean the worst thing that can happen is that we account a BO multiple
> times.
That's one of the problems.  The other one is the BO outliving the
lifetime of a cgroup and there's no good way to un-charge the usage
when the BO is free so the count won't be accurate.

I have looked into two possible solutions.  One is to prevent cgroup
from being removed when there are BOs owned by the cgroup still alive
(similar to how cgroup removal will fail if it still has processes
attached to it.)  My concern here is the possibility of not able to
remove a cgroup forever due to the lifetime of a BO (continuously
being shared and reuse and never die.)  Perhaps you can shed some
light on this possibility.

The other one is to keep track of all the buffers and migrate them to
the parent if a cgroup is closed.  My concern here is the performance
overhead on tracking all the buffers.

> And going into the same direction where is the code to handle an open
> device file descriptor which is send from one cgroup to another?
I looked into this before but I forgot what I found.  Perhaps folks
familiar with device cgroup can chime in.

Actually, just did another quick search right now.  Looks like the
access is enforced at the inode level (__devcgroup_check_permission)
so the fd sent to another cgroup that does not have access to the
device should still not have access.

Regards,
Kenny

> Regards,
> Christian.
>
> >
> > Regards,
> > Kenny
> >
> >>> On the other hand, if there are expectations for resource management
> >>> between containers, I would like to know who is the expected manager
> >>> and how does it fit into the concept of container (which enforce some
> >>> level of isolation.)  One possible manager may be the display server.
> >>> But as long as the display server is in a parent cgroup of the apps'
> >>> cgroup, the apps can still import handles from the display server
> >>> under the current implementation.  My understanding is that this is
> >>> most likely the case, with the display server simply sitting at the
> >>> default/root cgroup.  But I certainly want to hear more about other
> >>> use cases (for example, is running multiple display servers on a
> >>> single host a realistic possibility?  Are there people running
> >>> multiple display servers inside peer containers?  If so, how do they
> >>> coordinate resources?)
> >> We definitely have situations with multiple display servers running
> >> (just think of VR).
> >>
> >> I just can't say if they currently use cgroups in any way.
> >>
> >> Thanks,
> >> Christian.
> >>
> >>> I should probably summarize some of these into the commit message.
> >>>
> >>> Regards,
> >>> Kenny
> >>>
> >>>
> >>>
> >>>> Christian.
> >>>>
>