[Intel-gfx] [PATCH] drm/i915/slpc: Optmize waitboost for SLPC
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Oct 19 07:40:54 UTC 2022
On 18/10/2022 23:15, Vinay Belgaumkar wrote:
> Waitboost (when SLPC is enabled) results in a H2G message. This can result
> in thousands of messages during a stress test and fill up an already full
> CTB. There is no need to request for RP0 if GuC is already requesting the
> same.
>
> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar at intel.com>
> ---
> drivers/gpu/drm/i915/gt/intel_rps.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
> index fc23c562d9b2..a20ae4fceac8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
> @@ -1005,13 +1005,20 @@ void intel_rps_dec_waiters(struct intel_rps *rps)
> void intel_rps_boost(struct i915_request *rq)
> {
> struct intel_guc_slpc *slpc;
> + struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>
> if (i915_request_signaled(rq) || i915_request_has_waitboost(rq))
> return;
>
> + /* If GuC is already requesting RP0, skip */
> + if (rps_uses_slpc(rps)) {
> + slpc = rps_to_slpc(rps);
> + if (intel_rps_get_requested_frequency(rps) == slpc->rp0_freq)
> + return;
> + }
> +
Feels a little bit like a layering violation. Wait boost reference
counts and request markings will changed based on asynchronous state - a
mmio read.
Also, a little below we have this:
"""
/* Serializes with i915_request_retire() */
if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
if (rps_uses_slpc(rps)) {
slpc = rps_to_slpc(rps);
/* Return if old value is non zero */
if (!atomic_fetch_inc(&slpc->num_waiters))
***>>>> Wouldn't it skip doing anything here already? <<<<***
schedule_work(&slpc->boost_work);
return;
}
if (atomic_fetch_inc(&rps->num_waiters))
return;
"""
But I wonder if this is not a layering violation already. Looks like one
for me at the moment. And as it happens there is an ongoing debug of
clvk slowness where I was a bit puzzled by the lack of "boost fence" in
trace_printk logs - but now I see how that happens. Does not feel right
to me that we lose that tracing with SLPC.
So in general - why the correct approach wouldn't be to solve this in
the worker - which perhaps should fork to slpc specific branch and do
the consolidations/skips based on mmio reads in there?
Regards,
Tvrtko
> /* Serializes with i915_request_retire() */
> if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
> - struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>
> if (rps_uses_slpc(rps)) {
> slpc = rps_to_slpc(rps);
More information about the dri-devel
mailing list