[Intel-gfx] [PATCH v9 7/7] drm/i915: add a sysfs entry to let users set sseu configs

Tue Jun 12 10:33:34 UTC 2018

On 12/06/18 10:20, Joonas Lahtinen wrote:
> Quoting Chris Wilson (2018-06-11 18:02:37)
>> Quoting Lionel Landwerlin (2018-06-11 14:46:07)
>>> On 11/06/18 13:10, Tvrtko Ursulin wrote:
>>>> On 30/05/2018 15:33, Lionel Landwerlin wrote:
>>>>> There are concerns about denial of service around the per context sseu
>>>>> configuration capability. In a previous commit introducing the
>>>>> capability we allowed it only for capable users. This changes adds a
>>>>> new debugfs entry to let any user configure its own context
>>>>> powergating setup.
>>>> As far as I understood it, Joonas' concerns here are:
>>>>
>>>> 1) That in the containers use case individual containers wouldn't be
>>>> able to turn on the sysfs toggle for them.
>>>>
>>>> 2) That also in the containers use case if box admin turns on the
>>>> feature, some containers would potentially start negatively affecting
>>>> the others (via the accumulated cost of slice re-configuration on
>>>> context switching).
>>>>
>>>> I am not familiar with typical container setups to be authoritative
>>>> here, but intuitively I find it reasonable that a low-level hardware
>>>> switch like this would be under the control of a master domain
>>>> administrator. ("If you are installing our product in the container
>>>> environment, make sure your system administrator enables this hardware
>>>> feature.", "Note to system administrators: Enabling this features may
>>>> negatively affect the performance of other containers.")
>>>>
>>>> Alternative proposal is for the i915 to apply an "or" filter on all
>>>> requested masks and in that way ensure dynamic re-configuration
>>>> doesn't happen on context switches, but driven from userspace via ioctls.
>>>>
>>>> In other words, should _all_ userspace agree between themselves that
>>>> they want to turn off a slice, they would then need to send out a
>>>> concerted ioctl storm, where number of needed ioctls equals the number
>>>> of currently active contexts. (This may have its own performance
>>>> consequences caused by the barriers needed to modify all context images.)
>>>>
>>>> This was deemed acceptable the the media use case, but my concern is
>>>> the approach is not elegant and will tie us with the "or" policy in
>>>> the ABI. (Performance concerns I haven't evaluated yet, but they also
>>>> may be significant.)
>>>>
>>>> If we go back thinking about the containers use case, then it
>>>> transpires that even though the "or" policy does prevent one container
>>>> from affecting the other from one angle, it also prevents one
>>>> container from exercising the feature unless all containers co-operate.
>>>>
>>>> As such, we can view the original problem statement where we have an
>>>> issue if not everyone co-operates, as conceptually the same just from
>>>> an opposite angle. (Rather than one container incurring the increased
>>>> cost of context switches to the rest, we would have one container
>>>> preventing the optimized slice configuration to the other.)
>>>>
>>>>  From this follows that both proposals require complete co-operation
>>>> from all running userspace to avoid complete control of the feature.
>>>>
>>>> Since the balance between the benefit of optimized slice configuration
>>>> (or penalty of suboptimal one), versus the penalty of increased
>>>> context switch times, cannot be know by the driver (barring venturing
>>>> into the heuristics territory), that is another reason why I find the
>>>> "or" policy in the driver questionable.
>>>>
>>>> We can also ask a question of - If we go with the "or" policy, why
>>>> require N per-context ioctls to modify the global GPU configuration
>>>> and not instead add a global driver ioctl to modify the state?
>>>>
>>>> If a future hardware requires, or enables, the per-context behaviour
>>>> in a more efficient way, we could then revisit the problem space.
>>>>
>>>> In the mean time I see the "or" policy solution as adding some ABI
>>>> which doesn't do anything for many use cases without any way for the
>>>> sysadmin to enable it. At the same time master sysfs knob at least
>>>> enables the sysadmin to make a decision. Here I am thinking about a
>>>> random client environment where not all userspace co-operates, but for
>>>> instance user is running the feature aware media stack, and
>>>> non-feature aware OpenCL/3d stack.
>>>>
>>>> I guess the complete story boils down to - is the master sysfs knob
>>>> really a problem in container use cases.
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>> Hey Tvrtko,
>>>
>>> Thanks for summarizing a bunch of discussions.
>>> Essentially I agree with every you wrote above.
>>>
>>> If we have a global setting (determined by the OR policy), what's the
>>> point of per context settings?
>>>
>>> In Dmitry's scenario, all userspace applications will work together to
>>> reach the consensus so it sounds like we're reimplementing the policy
>>> that is already existing in userspace.
>>>
>>> Anyway, I'm implementing Joonas' suggestion. Hopefully somebody else
>>> than me pick one or the other :)
>> I'll just mention the voting/consensus approach to see if anyone else
>> likes it.
>>
>> Each context has a CONTEXT_PARAM_HINT_SSEU { small, dontcare, large }
>> (or some other abstract names).
> Yeah, the param name should have the word _HINT_ in it when it's not a
> definitive set.
>
> There's no global setter across containers, only a scenario when
> everyone agrees or not. Tallying up the votes and going with a majority
> vote might be an option, too.
>
> Regards, Joonas

Trying to test the "everyone agrees" approach here.
There are a number of processes that can hold onto a gem context and 
therefore prevent agreement.
On my system plymouthd & systemd-login have a number of contexts opened.

A process simply opening&closing a render node could flip the system 
back and forth between 2 configurations.

Does that change people's mind about how we should go about this?

-
Lionel

>
>> Then whenever the host cares, they can evaluate the set of hints
>> provided and make a choice on sseu config. One presumes a simple greater
>> good method (but you could extends that to include batch
>> frequency/duration to try and determine system impact on one setting or
>> another). Keeping it a hint helps reduce the effect of policy, though it
>> may still be policy and merit a switch for different implementations (or
>> BPF!).
>> -Chris
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx