[Intel-gfx] [PATCH] drm/i915/slpc: Optmize waitboost for SLPC

Wed Oct 19 07:40:54 UTC 2022

On 18/10/2022 23:15, Vinay Belgaumkar wrote:
> Waitboost (when SLPC is enabled) results in a H2G message. This can result
> in thousands of messages during a stress test and fill up an already full
> CTB. There is no need to request for RP0 if GuC is already requesting the
> same.
> 
> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar at intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_rps.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
> index fc23c562d9b2..a20ae4fceac8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
> @@ -1005,13 +1005,20 @@ void intel_rps_dec_waiters(struct intel_rps *rps)
>   void intel_rps_boost(struct i915_request *rq)
>   {
>   	struct intel_guc_slpc *slpc;
> +	struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>   
>   	if (i915_request_signaled(rq) || i915_request_has_waitboost(rq))
>   		return;
>   
> +	/* If GuC is already requesting RP0, skip */
> +	if (rps_uses_slpc(rps)) {
> +		slpc = rps_to_slpc(rps);
> +		if (intel_rps_get_requested_frequency(rps) == slpc->rp0_freq)
> +			return;
> +	}
> +

Feels a little bit like a layering violation. Wait boost reference 
counts and request markings will changed based on asynchronous state - a 
mmio read.

Also, a little below we have this:

"""
	/* Serializes with i915_request_retire() */
	if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
		struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;

		if (rps_uses_slpc(rps)) {
			slpc = rps_to_slpc(rps);

			/* Return if old value is non zero */
			if (!atomic_fetch_inc(&slpc->num_waiters))

***>>>> Wouldn't it skip doing anything here already? <<<<***

				schedule_work(&slpc->boost_work);

			return;
		}

		if (atomic_fetch_inc(&rps->num_waiters))
			return;
"""

But I wonder if this is not a layering violation already. Looks like one 
for me at the moment. And as it happens there is an ongoing debug of 
clvk slowness where I was a bit puzzled by the lack of "boost fence" in 
trace_printk logs - but now I see how that happens. Does not feel right 
to me that we lose that tracing with SLPC.

So in general - why the correct approach wouldn't be to solve this in 
the worker - which perhaps should fork to slpc specific branch and do 
the consolidations/skips based on mmio reads in there?

Regards,

Tvrtko

>   	/* Serializes with i915_request_retire() */
>   	if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
> -		struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>   
>   		if (rps_uses_slpc(rps)) {
>   			slpc = rps_to_slpc(rps);