[Intel-gfx] [PATCH v3 5/6] drm/i915: Introduce 'priority offset' for GPU contexts (v2)

Thu Mar 8 18:48:51 UTC 2018

Quoting Matt Roper (2018-03-08 18:22:06)
> On Thu, Mar 08, 2018 at 01:11:33PM +0000, Chris Wilson wrote:
> > Quoting Chris Wilson (2018-03-08 11:32:04)
> > > Quoting Matt Roper (2018-03-06 23:46:59)
> > > > There are cases where a system integrator may wish to raise/lower the
> > > > priority of GPU workloads being submitted by specific OS process(es),
> > > > independently of how the software self-classifies its own priority.
> > > > Exposing "priority offset" as an i915-specific cgroup parameter will
> > > > enable such system-level configuration.
> > > > 
> > > > Normally GPU contexts start with a priority value of 0
> > > > (I915_CONTEXT_DEFAULT_PRIORITY) and then may be adjusted up/down from
> > > > there via other mechanisms.  We'd like to provide a system-level input
> > > > to the priority decision that will be taken into consideration, even
> > > > when userspace later attempts to set an absolute priority value via
> > > > I915_CONTEXT_PARAM_PRIORITY.  The priority offset introduced here
> > > > provides a base value that will always be added to (or subtracted from)
> > > > the software's self-assigned priority value.
> > > > 
> > > > This patch makes priority offset a cgroup-specific value; contexts will
> > > > be created with a priority offset based on the cgroup membership of the
> > > > process creating the context at the time the context is created.  Note
> > > > that priority offset is assigned at context creation time; migrating a
> > > > process to a different cgroup or changing the offset associated with a
> > > > cgroup will only affect new context creation and will not alter the
> > > > behavior of existing contexts previously created by the process.
> > > > 
> > > > v2:
> > > >  - Rebase onto new cgroup_priv API
> > > >  - Use current instead of drm_file->pid to determine which process to
> > > >    lookup priority for. (Chris)
> > > >  - Don't forget to subtract priority offset in context_getparam ioctl to
> > > >    make it match setparam behavior. (Chris)
> > > > 
> > > > Signed-off-by: Matt Roper <matthew.d.roper at intel.com>
> > > 
> > > For ctx->priority/ctx->priority_offset
> > > Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
> > 
> > As a reminder, we do have the question of how to bound ctx->priority +
> > ctx->priority_offset. Currently, outside of the user range is privileged
> > space reserved for the kernel (both above and below). Now we can move
> > those even further to accommodate multiple non-overlapping cgroups.
> 
> Yep, I mentioned this as an open question in the series coverletter.
> 
> My understanding is that today we limit userspace-assigned priorities to
> [1023,1023] and that the kernel can use the userspace range plus -1024
> (for the kernel idle context), 1024 (for boosting contexts the display
> is waiting on), and INT_MAX (for the preempt context).  Are there any
> other special values we use today that I'm overlooking?
> 
> I think we have the following options with how to proceed:
> 
>  * Option 1:  Silently limit (priority+priority offset) to the existing
>    [-1023,1023] range.  That means that if, for example, a user context
>    had a priority offset of 600 and tried to manually boost its context
>    priority to 600, it would simply be treated as a 1023 for scheduling
>    purposes.  This approach is simple, but doesn't allow us to force
>    contexts in different cgroups into non-overlapping priority ranges
>    (i.e., lowest possible priority in one cgroup is still higher than
>    the highest possible priority in another cgroup).  It also isn't
>    possible to define a class of applications as "more important than
>    display" via cgroup, which I think might be useful in some cases.

Agreed, non-overlapping seems to be a useful property (apply the usual
disclaimer for the rudimentary scheduler).

>  * Option 2:  Allow priority offset to be set in a much larger range
>    (e.g., [SHRT_MIN+1023,SHRT_MAX-1023]).  This allows cgroups to have
>    effective priority ranges that don't overlap.  cgroup ranges can also
>    be intentionally set above I915_PRIORITY_DISPLAY to allow us to
>    define classes of applications whose work is more important than
>    maintaining a stable framerate on the display.  We'd also probably
>    need to shift the kernel idle context down to something like INT_MIN
>    instead of using -1024.

INT_MIN+1 just to be awkward. (INT_MIN is I915_PRIORITY_INVALID.)

Something to bear in mind is that I want to reserve the low 8 bits of
ctx->priority for internal use (heuristic adjustments by the scheduler).
Can shrink that to say 3 bits in the current scheme.

3bits for internal adjustment
20bits for user priority.
8bits for cgroup offset.
signbit.

Nothing prohibits us from moving to 64b if required. But 32b should be
enough range for plenty of clients and cgroups, imo. Reducing cgroup
offset to 6 bits would be nice :)

>  * Option 3: No limits, leave behavior as it stands now in this patch
>    series (unrestricted range).  If you're privileged enough to define
>    the cgroup settings for a system and you decide to shoot yourself in
>    the foot by setting them to nonsense values, that's your own fault.

Overflow have a habit of catching us out, so I'd like to rule that out.
And once we define a suitable range, it's hard to reduce it without
userspace noticing and screaming regression.

> Thoughts?  Any other options to consider?

Another option would be to say keep a plist for each cgroup and
round-robin between cgroups (or somesuch, or just a plist of cgroups to
keep things priority based). Down that road lies CFS with fairness inside
cgroups and between cgroups.

> > We also have the presumption that only privileged processes can raise a
> > priority, but I presume the cgroup property itself is protected.
> 
> The way the code is written write now, anyone who has write access to
> the cgroup itself (i.e., can migrate processes into/out of the cgroup)
> is allowed to adjust the i915-specific cgroup settings.  So this depends
> on how the cgroups-v2 filesystem was initially mounted and/or how the
> filesystem permissions were adjusted after the fact; the details may
> vary from system to system depending on the policy decisions of the
> system integrator / system administrator.  But I think an off-the-shelf
> Linux distro will only give the system administrator sufficient
> permissions to adjust cgroups (if it even mounts the cgroups-v2
> filesystem in the first place), so anyone else that winds up with those
> privileges has had them intentionally delegated.

Ok. I just worry about how easy it is to starve the system and
accidentally DoS oneself. :)
-Chris