[Intel-gfx] [PATCH v9 7/7] drm/i915: add a sysfs entry to let users set sseu configs

Tue Jun 12 10:37:32 UTC 2018

Quoting Lionel Landwerlin (2018-06-12 11:33:34)
> On 12/06/18 10:20, Joonas Lahtinen wrote:
> > Quoting Chris Wilson (2018-06-11 18:02:37)
> >> Quoting Lionel Landwerlin (2018-06-11 14:46:07)
> >>> On 11/06/18 13:10, Tvrtko Ursulin wrote:
> >>>> On 30/05/2018 15:33, Lionel Landwerlin wrote:
> >>>>> There are concerns about denial of service around the per context sseu
> >>>>> configuration capability. In a previous commit introducing the
> >>>>> capability we allowed it only for capable users. This changes adds a
> >>>>> new debugfs entry to let any user configure its own context
> >>>>> powergating setup.
> >>>> As far as I understood it, Joonas' concerns here are:
> >>>>
> >>>> 1) That in the containers use case individual containers wouldn't be
> >>>> able to turn on the sysfs toggle for them.
> >>>>
> >>>> 2) That also in the containers use case if box admin turns on the
> >>>> feature, some containers would potentially start negatively affecting
> >>>> the others (via the accumulated cost of slice re-configuration on
> >>>> context switching).
> >>>>
> >>>> I am not familiar with typical container setups to be authoritative
> >>>> here, but intuitively I find it reasonable that a low-level hardware
> >>>> switch like this would be under the control of a master domain
> >>>> administrator. ("If you are installing our product in the container
> >>>> environment, make sure your system administrator enables this hardware
> >>>> feature.", "Note to system administrators: Enabling this features may
> >>>> negatively affect the performance of other containers.")
> >>>>
> >>>> Alternative proposal is for the i915 to apply an "or" filter on all
> >>>> requested masks and in that way ensure dynamic re-configuration
> >>>> doesn't happen on context switches, but driven from userspace via ioctls.
> >>>>
> >>>> In other words, should _all_ userspace agree between themselves that
> >>>> they want to turn off a slice, they would then need to send out a
> >>>> concerted ioctl storm, where number of needed ioctls equals the number
> >>>> of currently active contexts. (This may have its own performance
> >>>> consequences caused by the barriers needed to modify all context images.)
> >>>>
> >>>> This was deemed acceptable the the media use case, but my concern is
> >>>> the approach is not elegant and will tie us with the "or" policy in
> >>>> the ABI. (Performance concerns I haven't evaluated yet, but they also
> >>>> may be significant.)
> >>>>
> >>>> If we go back thinking about the containers use case, then it
> >>>> transpires that even though the "or" policy does prevent one container
> >>>> from affecting the other from one angle, it also prevents one
> >>>> container from exercising the feature unless all containers co-operate.
> >>>>
> >>>> As such, we can view the original problem statement where we have an
> >>>> issue if not everyone co-operates, as conceptually the same just from
> >>>> an opposite angle. (Rather than one container incurring the increased
> >>>> cost of context switches to the rest, we would have one container
> >>>> preventing the optimized slice configuration to the other.)
> >>>>
> >>>>  From this follows that both proposals require complete co-operation
> >>>> from all running userspace to avoid complete control of the feature.
> >>>>
> >>>> Since the balance between the benefit of optimized slice configuration
> >>>> (or penalty of suboptimal one), versus the penalty of increased
> >>>> context switch times, cannot be know by the driver (barring venturing
> >>>> into the heuristics territory), that is another reason why I find the
> >>>> "or" policy in the driver questionable.
> >>>>
> >>>> We can also ask a question of - If we go with the "or" policy, why
> >>>> require N per-context ioctls to modify the global GPU configuration
> >>>> and not instead add a global driver ioctl to modify the state?
> >>>>
> >>>> If a future hardware requires, or enables, the per-context behaviour
> >>>> in a more efficient way, we could then revisit the problem space.
> >>>>
> >>>> In the mean time I see the "or" policy solution as adding some ABI
> >>>> which doesn't do anything for many use cases without any way for the
> >>>> sysadmin to enable it. At the same time master sysfs knob at least
> >>>> enables the sysadmin to make a decision. Here I am thinking about a
> >>>> random client environment where not all userspace co-operates, but for
> >>>> instance user is running the feature aware media stack, and
> >>>> non-feature aware OpenCL/3d stack.
> >>>>
> >>>> I guess the complete story boils down to - is the master sysfs knob
> >>>> really a problem in container use cases.
> >>>>
> >>>> Regards,
> >>>>
> >>>> Tvrtko
> >>> Hey Tvrtko,
> >>>
> >>> Thanks for summarizing a bunch of discussions.
> >>> Essentially I agree with every you wrote above.
> >>>
> >>> If we have a global setting (determined by the OR policy), what's the
> >>> point of per context settings?
> >>>
> >>> In Dmitry's scenario, all userspace applications will work together to
> >>> reach the consensus so it sounds like we're reimplementing the policy
> >>> that is already existing in userspace.
> >>>
> >>> Anyway, I'm implementing Joonas' suggestion. Hopefully somebody else
> >>> than me pick one or the other :)
> >> I'll just mention the voting/consensus approach to see if anyone else
> >> likes it.
> >>
> >> Each context has a CONTEXT_PARAM_HINT_SSEU { small, dontcare, large }
> >> (or some other abstract names).
> > Yeah, the param name should have the word _HINT_ in it when it's not a
> > definitive set.
> >
> > There's no global setter across containers, only a scenario when
> > everyone agrees or not. Tallying up the votes and going with a majority
> > vote might be an option, too.
> >
> > Regards, Joonas
> 
> Trying to test the "everyone agrees" approach here.

It's not everyone agrees, but the greater good. 

> There are a number of processes that can hold onto a gem context and 
> therefore prevent agreement.
> On my system plymouthd & systemd-login have a number of contexts opened.
But they should be dontcare?

There should only be a few processes that insist on a particular
configuration, afui.
-Chris