[RFC PATCH v3 04/11] drm, cgroup: Add total GEM buffer allocation limit

Wed Jun 26 21:41:13 UTC 2019

On Wed, Jun 26, 2019 at 05:27:48PM -0400, Kenny Ho wrote:
> On Wed, Jun 26, 2019 at 12:05 PM Daniel Vetter <daniel at ffwll.ch> wrote:
> >
> > > drm.buffer.default
> > >         A read-only flat-keyed file which exists on the root cgroup.
> > >         Each entry is keyed by the drm device's major:minor.
> > >
> > >         Default limits on the total GEM buffer allocation in bytes.
> >
> > Don't we need a "0 means no limit" semantics here?
> 
> I believe the convention is to use the 'max' keyword.
> 
> >
> > I think we need a new drm-cgroup.rst which contains all this
> > documentation.
> 
> Yes I planned to do that when things are more finalized.  I am
> actually writing the commit message following the current doc format
> so I can reuse it in the rst.

Awesome.

> > With multiple GPUs, do we need an overall GEM bo limit, across all gpus?
> > For other stuff later on like vram/tt/... and all that it needs to be
> > per-device, but I think one overall limit could be useful.
> 
> This one I am not sure but should be fairly straightforward to add.
> I'd love to hear more feedbacks on this as well.
> 
> > >       if (!amdgpu_bo_validate_size(adev, size, bp->domain))
> > >               return -ENOMEM;
> > >
> > > +     if (!drmcgrp_bo_can_allocate(current, adev->ddev, size))
> > > +             return -ENOMEM;
> >
> > So what happens when you start a lot of threads all at the same time,
> > allocating gem bo? Also would be nice if we could roll out at least the
> > accounting part of this cgroup to all GEM drivers.
> 
> When there is a large number of allocation, the allocation will be
> checked in sequence within a device (since I used a per device mutex
> in the check.)  Are you suggesting the overhead here is significant
> enough to be a bottleneck?  The accounting part should be available to
> all GEM drivers (unless I missed something) since the chg and unchg
> function is called via the generic drm_gem_private_object_init and
> drm_gem_object_release.

thread 1: checks limits, still under the total

thread 2: checks limits, still under the total

thread 1: allocates, still under

thread 2: allocates, now over the limit

I think the check and chg need to be one step, or this wont work. Or I'm
missing something somewhere.

Wrt rolling out the accounting for all drivers: Since you also roll out
enforcement in this patch I'm not sure whether the accounting part is
fully stand-alone. And as discussed a bit on an earlier patch, I think for
DRIVER_GEM we should set up the accounting cgroup automatically.

> > > +     /* only allow bo from the same cgroup or its ancestor to be imported */
> > > +     if (drmcgrp != NULL &&
> >
> > Quite a serious limitation here ...
> >
> > > +                     !drmcgrp_is_self_or_ancestor(drmcgrp, obj->drmcgrp)) {
> >
> > Also what happens if you actually share across devices? Then importing in
> > the 2nd group is suddenly possible, and I think will be double-counted.
> >
> > What's the underlying technical reason for not allowing sharing across
> > cgroups?
> 
> With the current implementation, there shouldn't be double counting as
> the counting is done during the buffer init.

If you share across devices there will be two drm_gem_obect structures, on
on each device. But only one underlying bo.

Now the bo limit is per-device too, so that's all fine, but for a global
bo limit we'd need to make sure we count these only once.

> To be clear, sharing across cgroup is allowed, the buffer just needs
> to be allocated by a process that is parent to the cgroup.  So in the
> case of xorg allocating buffer for client, the xorg would be in the
> root cgroup and the buffer can be passed around by different clients
> (in root or other cgroup.)  The idea here is to establish some form of
> ownership, otherwise there wouldn't be a way to account for or limit
> the usage.

But why? What's the problem if I allocate something and then hand it to
someone else. E.g. one popular use of cgroups is to isolate clients, so
maybe you'd do a cgroup + namespace for each X11 client (ok wayland, with
X11 this is probably pointless).

But with your current limitation those clients can't pass buffers to the
compositor anymore, making cgroups useless. Your example here only works
if Xorg is in the root and allocates all the buffers. That's not even true
for DRI3 anymore.

So pretty serious limitation on cgroups, and I'm not really understanding
why we need this. I think if we want to prevent buffer sharing, what we
need are some selinux hooks and stuff so you can prevent an import/access
by someone who's not allowed to touch a buffer. But that kind of access
right management should be separate from resource control imo.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch