[Intel-gfx] [PATCH v9 7/7] drm/i915: add a sysfs entry to let users set sseu configs

Mon Jun 11 15:02:37 UTC 2018

Quoting Lionel Landwerlin (2018-06-11 14:46:07)
> On 11/06/18 13:10, Tvrtko Ursulin wrote:
> >
> > On 30/05/2018 15:33, Lionel Landwerlin wrote:
> >> There are concerns about denial of service around the per context sseu
> >> configuration capability. In a previous commit introducing the
> >> capability we allowed it only for capable users. This changes adds a
> >> new debugfs entry to let any user configure its own context
> >> powergating setup.
> >
> > As far as I understood it, Joonas' concerns here are:
> >
> > 1) That in the containers use case individual containers wouldn't be 
> > able to turn on the sysfs toggle for them.
> >
> > 2) That also in the containers use case if box admin turns on the 
> > feature, some containers would potentially start negatively affecting 
> > the others (via the accumulated cost of slice re-configuration on 
> > context switching).
> >
> > I am not familiar with typical container setups to be authoritative 
> > here, but intuitively I find it reasonable that a low-level hardware 
> > switch like this would be under the control of a master domain 
> > administrator. ("If you are installing our product in the container 
> > environment, make sure your system administrator enables this hardware 
> > feature.", "Note to system administrators: Enabling this features may 
> > negatively affect the performance of other containers.")
> >
> > Alternative proposal is for the i915 to apply an "or" filter on all 
> > requested masks and in that way ensure dynamic re-configuration 
> > doesn't happen on context switches, but driven from userspace via ioctls.
> >
> > In other words, should _all_ userspace agree between themselves that 
> > they want to turn off a slice, they would then need to send out a 
> > concerted ioctl storm, where number of needed ioctls equals the number 
> > of currently active contexts. (This may have its own performance 
> > consequences caused by the barriers needed to modify all context images.)
> >
> > This was deemed acceptable the the media use case, but my concern is 
> > the approach is not elegant and will tie us with the "or" policy in 
> > the ABI. (Performance concerns I haven't evaluated yet, but they also 
> > may be significant.)
> >
> > If we go back thinking about the containers use case, then it 
> > transpires that even though the "or" policy does prevent one container 
> > from affecting the other from one angle, it also prevents one 
> > container from exercising the feature unless all containers co-operate.
> >
> > As such, we can view the original problem statement where we have an 
> > issue if not everyone co-operates, as conceptually the same just from 
> > an opposite angle. (Rather than one container incurring the increased 
> > cost of context switches to the rest, we would have one container 
> > preventing the optimized slice configuration to the other.)
> >
> > From this follows that both proposals require complete co-operation 
> > from all running userspace to avoid complete control of the feature.
> >
> > Since the balance between the benefit of optimized slice configuration 
> > (or penalty of suboptimal one), versus the penalty of increased 
> > context switch times, cannot be know by the driver (barring venturing 
> > into the heuristics territory), that is another reason why I find the 
> > "or" policy in the driver questionable.
> >
> > We can also ask a question of - If we go with the "or" policy, why 
> > require N per-context ioctls to modify the global GPU configuration 
> > and not instead add a global driver ioctl to modify the state?
> >
> > If a future hardware requires, or enables, the per-context behaviour 
> > in a more efficient way, we could then revisit the problem space.
> >
> > In the mean time I see the "or" policy solution as adding some ABI 
> > which doesn't do anything for many use cases without any way for the 
> > sysadmin to enable it. At the same time master sysfs knob at least 
> > enables the sysadmin to make a decision. Here I am thinking about a 
> > random client environment where not all userspace co-operates, but for 
> > instance user is running the feature aware media stack, and 
> > non-feature aware OpenCL/3d stack.
> >
> > I guess the complete story boils down to - is the master sysfs knob 
> > really a problem in container use cases.
> >
> > Regards,
> >
> > Tvrtko 
> 
> Hey Tvrtko,
> 
> Thanks for summarizing a bunch of discussions.
> Essentially I agree with every you wrote above.
> 
> If we have a global setting (determined by the OR policy), what's the 
> point of per context settings?
> 
> In Dmitry's scenario, all userspace applications will work together to 
> reach the consensus so it sounds like we're reimplementing the policy 
> that is already existing in userspace.
> 
> Anyway, I'm implementing Joonas' suggestion. Hopefully somebody else 
> than me pick one or the other :)

I'll just mention the voting/consensus approach to see if anyone else
likes it.

Each context has a CONTEXT_PARAM_HINT_SSEU { small, dontcare, large }
(or some other abstract names).

Then whenever the host cares, they can evaluate the set of hints
provided and make a choice on sseu config. One presumes a simple greater
good method (but you could extends that to include batch
frequency/duration to try and determine system impact on one setting or
another). Keeping it a hint helps reduce the effect of policy, though it
may still be policy and merit a switch for different implementations (or
BPF!).
-Chris