[Intel-gfx] [PATCH 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero

Tue Aug 16 23:55:59 UTC 2022

On Tue, 2022-08-16 at 15:45 -0700, Harrison, John C wrote:
> On 8/15/2022 19:17, Alan Previn wrote:
> > From: Matthew Brost <matthew.brost at intel.com>
> > 
> > Add a delay, configurable via debugfs (default 34ms), to disable
> > (default is 3/4) of the guc_ids.
> are in use.
> 
my bad - got clipped - will fix.

> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -65,7 +65,13 @@
> >    * corresponding G2H returns indicating the scheduling disable operation has
> >    * completed it is safe to unpin the context. While a disable is in flight it
> >    * isn't safe to resubmit the context so a fence is used to stall all future
> > - * requests of that context until the G2H is returned.
> > + * requests of that context until the G2H is returned. Because this interaction
> > + * with the GuC takes a non-zero amount of time we delay the disabling of
> > + * scheduling after the pin count goes to zero by a configurable period of time
> > + * (see SCHED_DISABLE_DELAY_MS). The thought is this gives the user a window of
> > + * time to resubmit something on the context before doing this costly operation.
> > + * This delay is only done if the context isn't closed and the guc_id usage is
> > + * less than a threshold (default 3/4, see NUM_SCHED_DISABLE_GUC_IDS_THRESHOLD)
> The delay text has dropped the 'default 34ms' in preference for just 
> referencing the define but the threshold still mentions both. Is that 
> intentional? Dropping the values seems sensible - no need to update the 
> comment if the numeric value is changed at some later point.
> 
Yes intentional - and yes, i should be consistent for both. will fix.

> >    *
> > @@ -4207,6 +4288,20 @@ static bool __guc_submission_selected(struct intel_guc *guc)
> > +int intel_guc_sched_disable_gucid_threshold_max(struct intel_guc *guc)
> > +{
> > +	return guc->submission_state.num_guc_ids - NUMBER_MULTI_LRC_GUC_ID(guc);
> > +}
> > +
> > +#define SCHED_DISABLE_DELAY_MS	34
> > +/*
> > + * This default value of 33 milisecs (+1 milisec buffer) ensures 33fps workloads are
> > + * able to enjoy the latency reduction when delaying the schedule-disable operation
> > + */
> > +#define NUM_SCHED_DISABLE_GUCIDS_DEFAULT_THRESHOLD(__guc) \
> > +	(((intel_guc_sched_disable_gucid_threshold_max(guc)) * 3) / 4)
> > +/* Default threshold to bypass delay-schedule-disable when under guc-id pressure */
> > +
> Comments always go in front of the thing they are describing, not after.
Will fix.

> 
> Also, the description reads as just a more verbose version of the 
> #define. As in, more words but no extra information. Not sure what more 
> can be said about the threshold. I'm not aware of any empirical or 
> theoretical evidence as to why 75% is a good number beyond 'it just 
> seems sensible'.
Yes - this was merely a sensible choice that wasnt part of a known performance issue with real world workloads
(considering we have thousands of guc-ids at our disposal). Will fix (remove the comment since its self-descriptive
anyways).


> The 'make it work for 33fps' seems quite arbitrary and 
> magical, though. What is so special about 33fps? I feel like it should 
> at least mention that an real-world use case requires 33fps (was it 
> really 33 not 30?)
The patch comment already includes the workload description: game-rendering + encoding at 30 fps (not as fast as
possible but with latency deadlines at that fps). But i can include it again in this #define if you prefer:
(would this work?)
     /*
      * This default value of 33 milisecs (+1 milisec buffer) ensures 33fps or higher fps
      * workloads are able to enjoy the latency reduction when delaying the schedule-disable
      * operation. This matches the 30fps game-render + encode (real world) workload this
      * knob was tested against.
      */
     #define SCHED_DISABLE_DELAY_MS	34


> and that anything faster (e.g. 60fps) will automatically be covered.
Is this really necessary to state? I think that is an obvious inference if we know anything about fps and periods.

> And that in the other direction, ~30ms is not 
> so slow that it leads to large numbers of idle contexts floating around. 
> Did we have any specific reasons against going larger? I recall the 
> original experiments were with 100ms. Was there a specific reason for 
> rejection that value?
In some initial internal conversations, there was some concern about keeping contexts pinned and possible impact the GuC
subsystem power residency usage - but that was later discounted. So at this point, we really can keep any number we want
but since we already did the work testing for both 100ms and 34 ms, I thought it would be better to stick with 34 ms
simply because of its clear relationship to the rate of context submission for the actual workload that was tested (i.e.
30 fps workloads that were getting submitted at 33.333 milisecs apart). That said, would above comment mod work?

>