DRM cgroups integration (Was: Re: [PATCH v4 0/8] cgroup private data and DRM/i915 integration)

Thu Apr 5 15:48:15 UTC 2018

Just realized Joonas had some other comments down at the bottom that I
missed originally...

On Thu, Apr 05, 2018 at 08:06:23AM -0700, Matt Roper wrote:
> On Thu, Apr 05, 2018 at 07:49:44AM -0700, Matt Roper wrote:
> > On Thu, Apr 05, 2018 at 05:15:13PM +0300, Joonas Lahtinen wrote:
> > > + Some more Cc's based on IRC discussion
> > > 
> > > Quoting Joonas Lahtinen (2018-04-05 16:46:51)
> > > > + Dave for commenting from DRM subsystem perspective. I strongly believe
> > > > there would be benefit from agreeing on some foundation of DRM subsystem
> > > > level program GPU niceness [-20,19] and memory limit [0,N] pages.
> > 
> > +Chris Wilson
> 
> +more Cc's based on IRC discussion.
> 
> 
> Matt
> 
> > 
> > If we ignore backward compatibility and ABI issues for now and assume
> > all drivers can move to [-20, 19] for application-accessible priority
> > ranges (i.e., self opt-in via existing context parameter ioctls and
> > such), I think we still need a much larger range for the priority offset
> > assigned via cgroup.  One of the goals with the cgroup work is to give
> > the system integrator the ability to define classes of applications with
> > non-overlapping ranges (i.e., in some systems an app in the "very
> > important" cgroup that self-lowers itself as far as possible should
> > still be prioritized higher than an app in the "best effort" cgroup that
> > self-boosts itself as far as possible).  Chris Wilson and I discussed
> > that a bit on this thread if you want to see the context:
> > 
> > https://lists.freedesktop.org/archives/intel-gfx/2018-March/158204.html
> > 
> > That discussion was based on i915's current application-accessible range
> > of [-1023,1023].  If we shift to a much smaller [-20,19] range for
> > applications to use directly, then we might want to allow cgroup offset
> > to be something like [-1000,1000] to ensure the system integrator has
> > enough flexibility?
> > 
> > Also note that "display boost" (i.e., a priority boost for contexts that
> > are blocking a flip) may vary depending on system setup, so setting
> > being able to define the display boost's effective priority via cgroup
> > is important as well.
> > 
> > 
> > Matt
> > 
> > 
> > 
> > > > 
> > > > Quoting Matt Roper (2018-03-30 03:43:13)
> > > > > On Mon, Mar 26, 2018 at 10:30:23AM +0300, Joonas Lahtinen wrote:
> > > > > > Quoting Matt Roper (2018-03-23 17:46:16)
> > > > > > > On Fri, Mar 23, 2018 at 02:15:38PM +0200, Joonas Lahtinen wrote:
> > > > > > > > Quoting Matt Roper (2018-03-17 02:08:57)
> > > > > > > > > This is the fourth iteration of the work previously posted here:
> > > > > > > > >   (v1) https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
> > > > > > > > >   (v2) https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg208170.html
> > > > > > > > >   (v3) https://lists.freedesktop.org/archives/intel-gfx/2018-March/157928.html
> > > > > > > > > 
> > > > > > > > > The high level goal of this work is to allow non-cgroup-controller parts
> > > > > > > > > of the kernel (e.g., device drivers) to register their own private
> > > > > > > > > policy data for specific cgroups.  That mechanism is then made use of in
> > > > > > > > > the i915 graphics driver to allow GPU priority to be assigned according
> > > > > > > > > to the cgroup membership of the owning process.  Please see the v1 cover
> > > > > > > > > letter linked above for a more in-depth explanation and justification.
> > > > > > > > 
> > > > > > > > Hi Matt,
> > > > > > > > 
> > > > > > > > For cross-subsystem changes such as this, it makes sense to Cc all
> > > > > > > > relevant maintainers, especially if there have been previous comments to
> > > > > > > > earlier revisions.
> > > > > > > > 
> > > > > > > > Please, do include and keep a reference to the userspace portion of the
> > > > > > > > changes when you suggest new uAPI to be added. At least I have some trouble
> > > > > > > > trying to track down the relevant interface consumer here.
> > > > > > > > 
> > > > > > > > I'm unsure how much sense it makes to commence with detailed i915 review
> > > > > > > > if we will be blocked by lack of userspace after that? I'm assuming
> > > > > > > > you've read through [1] already.
> > > > > > > 
> > > > > > > Hi Joonas,
> > > > > > > 
> > > > > > > I've sent the userspace code out a few times, but it looks like I forgot
> > > > > > > to include a copy with v4/v4.5.  Here's the version I provided with v3:
> > > > > > >   https://lists.freedesktop.org/archives/intel-gfx/2018-March/157935.html
> > > > > > 
> > > > > > Thanks. Keeping that in the relevant commit message of the patch that
> > > > > > introduces the new uAPI will make it harder to forget and easiest for
> > > > > > git blame, too.
> > > > > > 
> > > > > > > 
> > > > > > > Usually we don't consider things like i-g-t to be sufficient userspace
> > > > > > > consumers because we need a real-world consumer rather than a "toy"
> > > > > > > userspace.  However in this case, the i-g-t tool, although very simple,
> > > > > > > is really the only userspace consumer I expect there to ever be.
> > > > > > > Ultimately the true consumer of this cgroups work are bash scripts, sysv
> > > > > > > init scripts, systemd recipes, etc.  that just need a very simple tool
> > > > > > > to assign the specific values that make sense on a given system.
> > > > > > > There's no expectation that graphics clients or display servers would
> > > > > > > ever need to make use of these interfaces.
> > > > > > 
> > > > > > I was under the impression that a bit more generic GPU cgroups support
> > > > > > was receiving a lot of support in the early discussion? A dedicated
> > > > > > intel_cgroup sounds underwhelming, when comparing to idea of "gpu_nice",
> > > > > > for user adoption :)
> > > > > 
> > > > > I'm open to moving the cgroup_priv registration/lookup to the DRM core
> > > > > if other drivers are interested in using this mechanism and if we can
> > > > > come to an agreement on a standard priority offset range to support, how
> > > > > display boost should work for all drivers, etc.  There might be some
> > > > > challenges mapping a DRM-defined priority range down to a different
> > > > > range that makes sense for individual driver schedulers, especially
> > > > > since some drivers already expose a different priority scheme to
> > > > > userspace via other interfaces like i915 does with GEM context priority.
> > > > > 
> > > > > So far I haven't really heard any interest outside the Intel camp, but
> > > > > hopefully other driver teams can speak up if they're for/against this.
> > > > > I don't want to try to artificially standardize this if other drivers
> > > > > want to go a different direction with priority/scheduling that's too
> > > > > different from the current Intel-driven design.
> > > > 
> > > > I don't think there are that many directions to go about GPU context
> > > > priority, considering we have the EGL_IMG_context_priority extension, so
> > > > it'll only be about granularity of the scale.
> > > > 
> > > > I would suggest to go with the nice like scale for easy user adoption,
> > > > then just apply that as the N most significant bits.
> > > > 
> > > > The contexts could then of course further adjust their priority from what
> > > > is set by the "gpu_nice" application with the remaining bits.
> > > > 
> > > > I'm strongly feeling this should be a DRM level "gpu_nice". And the
> > > > binding to cgroups should come through DRM core. If it doesn't, limiting
> > > > the amount of memory used becomes awkward as the allocation is
> > > > centralized to DRM core.

My earliest iterations of this work did add the interfaces to set
cgroup-based GPU priority at the DRM core.  I didn't really get the
sense at the time that anyone outside of Intel was interested in the
cgroups work, so I moved the interfaces back into i915 for subequent
updates, but I'm happy to move it back.  Note that even in that case
this will still end up as a DRM core ioctl rather than a true cgroups
controller; see below.

> > > > 
> > > > > > Also, I might not be up-to-date about all things cgroups, but the way
> > > > > > intel_cgroup works, feels bit forced. We create a userspace context just
> > > > > > to communicate with the driver and the IOCTL will still have global
> > > > > > effects. I can't but think that i915 reading from the cgroups subsystem
> > > > > > for the current process would feel more intuitive to me.
> > > > > 
> > > > > I think you're referring to the earlier discussion about exposing
> > > > > priority directly via the cgroups filesystem?  That would certainly be
> > > > > simpler from a userspace perspective, but it's not the direction that
> > > > > the cgroups maintainer wants to see things go.  Adding files directly to
> > > > > the cgroups filesystem is supposed to be something that's reserved for
> > > > > official cgroups controllers.  The GPU priority concept we're trying to
> > > > > add here doesn't align with the requirements for creating a controller,
> > > > > so the preferred approach is to create a custom interface (syscall or
> > > > > ioctl) that simply takes a cgroup as a parameter.  There's precendent
> > > > > with similar interfaces in areas like BPF (where the bpf() system call
> > > > > can accept a cgroup as a parameter and then perform its own private
> > > > > policy changes as it sees fit).
> > > > > 
> > > > > Using a true cgroups controller and exposing settings via the filesystem
> > > > > is likely still the way we'll want to go for some other types of
> > > > > cgroups-based policy in the future (e.g., limiting GPU memory usage); it
> > > > > just isn't the appropriate direction for priority.
> > > > 
> > > > Might be just me but feels bit crazy to be setting GPU memory usage
> > > > through another interface and then doing i915 specific IOCTLs to control
> > > > the priority of that same cgroup.
> > > > 
> > > > I don't feel comfortable adding custom cgroups dependent IOCTLs to i915
> > > > where cgroups is only working as the variable carrier in background. We
> > > > should really just be consuming a variable from cgroups and it should be
> > > > set outside of of the i915 IOCTL interface.
> > > > 
> > > > I'm still seeing that we should have a DRM cgroups controller and a DRM
> > > > subsystem wide application to control the priority and memory usage
> > > > to be fed to the drivers.

A cgroups controller could be used for GPU memory management, but not
for priority adjustment the way we're doing it right now.  Although
there's some preexisting legacy stuff that doesn't quite follow the
rules, the cgroups subsystem will only accept new cgroup controllers if
they take the entire structure of the cgroups hierarchy into account
(rather than just looking at values at a leaf node) and are based on one
of the four resource distribution models described in the
Documentation/cgroup-v2.txt (weights, limits, protections, allocations).
GPU priority is somewhat close to a "weight" but not quite --- a true
weight wouldn't allow unfair scheduling or intentional starvation which
are very important to the use cases I'm focusing on.

The cgroups maintainer has suggested that a separate interface (syscall,
ioctl, etc.) be used for items like this that don't meet the criteria
for a full cgroups controller, and there are other examples of this kind
of design elsewhere in the kernel too.

> > > > 
> > > > If we end up just supporting i915 apps, we could as well use LD_PRELOAD
> > > > wrapper and alter the context priority at creation time for exactly the
> > > > same effect and no extra interfaces to maintain.

I think this is missing the point of using cgroups.  cgroups allow
policy to be set independently by a privileged system integrator or
system administrator, in a way that applications themselves don't have
access to.  LD_PRELOAD would only have access to the interfaces
available to the app itself and the app could simply undo whatever
setting the LD_PRELOAD tried to use.

The goal with cgroups isn't to provide a "gpu nice" where you
self-classify yourself, but rather the ability for the person defining
how the overall system works to classify and control the various types
of applications running in a transparent and global way.

> > > > 
> > > > > > Does the implementation mimic some existing cgroups tool or de-facto way
> > > > > > of doing things in cgroups world?
> > > > > 
> > > > > The ioctl approach I took is similar to syscall approach that the BPF
> > > > > guys use to attach BPF programs to a cgroup.  I'm not very familiar with
> > > > > BPF or how it gets used from userspace, so I'm not sure whether the
> > > > > interface is intended for one specific tool (like ours is), or whether
> > > > > there's more variety for userspace consumers.
> > > > 
> > > > Is the proposal to set the memory usage from similar interface, or is
> > > > that still not implemented?

GPU memory management is a very different resource model, so I'd expect
memory to be managed by a real cgroups controller. I.e., the interface
would be different.  I haven't personally done any work on cgroups-based
memory management; I agree that would be nice to have, but none of the
use cases my team is focused on have that as a requirement.

> > > > 
> > > > I'm seeing a very close relation between time-slicing GPU time and
> > > > allowed GPU buffer allocations, so having two completely different
> > > > interfaces does just feel very hackish way of implementing this.

If we were time slicing GPU time and giving out fractional shares of
time to each client, that would be a very different situation than we
have today; in that case I agree that a cgroups controller would be the
way to go.  While there probably are a lot of cases where time slicing
and fair scheduling would be beneficial (especially stuff like desktop
or mobile settings), I'm mostly focused on embedded use cases that
depend on today's unfair, strict priority scheduling.

Matt

> > > > 
> > > > Regards, Joonas
> > > > 
> > > > > 
> > > > > 
> > > > > Matt
> > > > > 
> > > > > > 
> > > > > > Regards, Joonas
> > > > > > --
> > > > > > To unsubscribe from this list: send the line "unsubscribe cgroups" in
> > > > > > the body of a message to majordomo at vger.kernel.org
> > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > 
> > > > > -- 
> > > > > Matt Roper
> > > > > Graphics Software Engineer
> > > > > IoTG Platform Enabling & Development
> > > > > Intel Corporation
> > > > > (916) 356-2795
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe cgroups" in
> > > the body of a message to majordomo at vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > -- 
> > Matt Roper
> > Graphics Software Engineer
> > IoTG Platform Enabling & Development
> > Intel Corporation
> > (916) 356-2795
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> IoTG Platform Enabling & Development
> Intel Corporation
> (916) 356-2795

-- 
Matt Roper
Graphics Software Engineer
IoTG Platform Enabling & Development
Intel Corporation
(916) 356-2795