[Intel-gfx] [RFC v2 0/5] Waitboost drm syncobj waits

Mon Feb 20 16:51:42 UTC 2023

On 20/02/2023 16:44, Tvrtko Ursulin wrote:
> 
> On 20/02/2023 15:52, Rob Clark wrote:
>> On Mon, Feb 20, 2023 at 3:33 AM Tvrtko Ursulin
>> <tvrtko.ursulin at linux.intel.com> wrote:
>>>
>>>
>>> On 17/02/2023 20:45, Rodrigo Vivi wrote:
> 
> [snip]
> 
>>> Yeah I agree. And as not all media use cases are the same, as are not
>>> all compute contexts someone somewhere will need to run a series of
>>> workloads for power and performance numbers. Ideally that someone would
>>> be the entity for which it makes sense to look at all use cases, from
>>> server room to client, 3d, media and compute for both. If we could get
>>> the capability to run this in some automated fashion, akin to CI, we
>>> would even have a chance to keep making good decisions in the future.
>>>
>>> Or we do some one off testing for this instance, but we still need a
>>> range of workloads and parts to do it properly..
>>>
>>>>> I also think the "arms race" scenario isn't really as much of a
>>>>> problem as you think.  There aren't _that_ many things using the GPU
>>>>> at the same time (compared to # of things using CPU).   And a lot of
>>>>> mobile games throttle framerate to avoid draining your battery too
>>>>> quickly (after all, if your battery is dead you can't keep buying loot
>>>>> boxes or whatever).
>>>>
>>>> Very good point.
>>>
>>> On this one I still disagree from the point of view that it does not
>>> make it good uapi if we allow everyone to select themselves for priority
>>> handling (one flavour or the other).
>>
>> There is plenty of precedent for userspace giving hints to the kernel
>> about scheduling and freq mgmt.  Like schedutil uclamp stuff.
>> Although I think that is all based on cgroups.
> 
> I knew about SCHED_DEADLINE and that it requires CAP_SYS_NICE, but I did 
> not know about uclamp. Quick experiment with uclampset suggests it 
> indeed does not require elevated privilege. If that is indeed so, it is 
> good enough for me as a precedent.
> 
> It appears to work using sched_setscheduler so maybe could define 
> something similar in i915/xe, per context or per client, not sure.
> 
> Maybe it would start as a primitive implementation but the uapi would 
> not preclude making it smart(er) afterwards. Or passing along to GuC to 
> do it's thing with it.

Hmmm having said that, how would we fix clvk performance using that? We 
would either need the library to do a new step when creating contexts, 
or allow external control so outside entity can do it. And then the 
question is based on what it decides to do it? Is it possible to know 
which, for instance, Chrome tab will be (or is) using clvk so that tab 
management code does it?

Regards,

Tvrtko

>> In the fence/syncobj case, I think we need per-wait hints.. because
>> for a single process the driver will be doing both housekeeping waits
>> and potentially urgent waits.  There may also be some room for some
>> cgroup or similar knobs to control things like what max priority an
>> app can ask for, and whether or how aggressively the kernel responds
>> to the "deadline" hints.  So as far as "arms race", I don't think I'd
> 
> Per wait hints are okay I guess even with "I am important" in their name 
> if sched_setscheduler allows raising uclamp.min just like that. In which 
> case cgroup limits to mimick cpu uclamp also make sense.
> 
>> change anything about my "fence deadline" proposal.. but that it might
>> just be one piece of the overall puzzle.
> 
> That SCHED_DEADLINE requires CAP_SYS_NICE does not worry you?
> 
> Regards,
> 
> Tvrtko