[Intel-gfx] [PATCH 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero

Wed Aug 17 00:06:53 UTC 2022

On 8/16/2022 16:55, Teres Alexis, Alan Previn wrote:
> On Tue, 2022-08-16 at 15:45 -0700, Harrison, John C wrote:
>> On 8/15/2022 19:17, Alan Previn wrote:
>>> From: Matthew Brost <matthew.brost at intel.com>
>>>
>>> Add a delay, configurable via debugfs (default 34ms), to disable
>>> (default is 3/4) of the guc_ids.
>> are in use.
>>
> my bad - got clipped - will fix.
>
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> @@ -65,7 +65,13 @@
>>>     * corresponding G2H returns indicating the scheduling disable operation has
>>>     * completed it is safe to unpin the context. While a disable is in flight it
>>>     * isn't safe to resubmit the context so a fence is used to stall all future
>>> - * requests of that context until the G2H is returned.
>>> + * requests of that context until the G2H is returned. Because this interaction
>>> + * with the GuC takes a non-zero amount of time we delay the disabling of
>>> + * scheduling after the pin count goes to zero by a configurable period of time
>>> + * (see SCHED_DISABLE_DELAY_MS). The thought is this gives the user a window of
>>> + * time to resubmit something on the context before doing this costly operation.
>>> + * This delay is only done if the context isn't closed and the guc_id usage is
>>> + * less than a threshold (default 3/4, see NUM_SCHED_DISABLE_GUC_IDS_THRESHOLD)
>> The delay text has dropped the 'default 34ms' in preference for just
>> referencing the define but the threshold still mentions both. Is that
>> intentional? Dropping the values seems sensible - no need to update the
>> comment if the numeric value is changed at some later point.
>>
> Yes intentional - and yes, i should be consistent for both. will fix.
>
>>>     *
>>> @@ -4207,6 +4288,20 @@ static bool __guc_submission_selected(struct intel_guc *guc)
>>> +int intel_guc_sched_disable_gucid_threshold_max(struct intel_guc *guc)
>>> +{
>>> +	return guc->submission_state.num_guc_ids - NUMBER_MULTI_LRC_GUC_ID(guc);
>>> +}
>>> +
>>> +#define SCHED_DISABLE_DELAY_MS	34
>>> +/*
>>> + * This default value of 33 milisecs (+1 milisec buffer) ensures 33fps workloads are
>>> + * able to enjoy the latency reduction when delaying the schedule-disable operation
>>> + */
>>> +#define NUM_SCHED_DISABLE_GUCIDS_DEFAULT_THRESHOLD(__guc) \
>>> +	(((intel_guc_sched_disable_gucid_threshold_max(guc)) * 3) / 4)
>>> +/* Default threshold to bypass delay-schedule-disable when under guc-id pressure */
>>> +
>> Comments always go in front of the thing they are describing, not after.
> Will fix.
>
>> Also, the description reads as just a more verbose version of the
>> #define. As in, more words but no extra information. Not sure what more
>> can be said about the threshold. I'm not aware of any empirical or
>> theoretical evidence as to why 75% is a good number beyond 'it just
>> seems sensible'.
> Yes - this was merely a sensible choice that wasnt part of a known performance issue with real world workloads
> (considering we have thousands of guc-ids at our disposal). Will fix (remove the comment since its self-descriptive
> anyways).
It should have some comment. Even if it is just "75% seems like a 
reasonable starting point, real world apps generally don't get anywhere 
near this".

>
>> The 'make it work for 33fps' seems quite arbitrary and
>> magical, though. What is so special about 33fps? I feel like it should
>> at least mention that an real-world use case requires 33fps (was it
>> really 33 not 30?)
> The patch comment already includes the workload description: game-rendering + encoding at 30 fps (not as fast as
> possible but with latency deadlines at that fps). But i can include it again in this #define if you prefer:
> (would this work?)
>       /*
>        * This default value of 33 milisecs (+1 milisec buffer) ensures 33fps or higher fps
>        * workloads are able to enjoy the latency reduction when delaying the schedule-disable
>        * operation. This matches the 30fps game-render + encode (real world) workload this
>        * knob was tested against.
>        */
>       #define SCHED_DISABLE_DELAY_MS	34
Sounds plausible. Although technically, 34 is not a +1 buffer it is 
rounding up the 1/3ms that we can't fit in an integer variable. And it 
should be "30fps or higher" given that the actual workload is 30fps. 
Supporting only >= 33fps would be insufficient.

>
>> and that anything faster (e.g. 60fps) will automatically be covered.
> Is this really necessary to state? I think that is an obvious inference if we know anything about fps and periods.
Fair enough.

>> And that in the other direction, ~30ms is not
>> so slow that it leads to large numbers of idle contexts floating around.
>> Did we have any specific reasons against going larger? I recall the
>> original experiments were with 100ms. Was there a specific reason for
>> rejection that value?
> In some initial internal conversations, there was some concern about keeping contexts pinned and possible impact the GuC
> subsystem power residency usage - but that was later discounted. So at this point, we really can keep any number we want
> but since we already did the work testing for both 100ms and 34 ms, I thought it would be better to stick with 34 ms
> simply because of its clear relationship to the rate of context submission for the actual workload that was tested (i.e.
> 30 fps workloads that were getting submitted at 33.333 milisecs apart). That said, would above comment mod work?
Okay. Sounds good.

John.