[Intel-gfx] [PATCH] drm/i915/guc/slpc: Use non-blocking H2G for waitboost

Thu May 5 17:21:10 UTC 2022

On 5/5/2022 5:13 AM, Tvrtko Ursulin wrote:
>
> On 05/05/2022 06:40, Vinay Belgaumkar wrote:
>> SLPC min/max frequency updates require H2G calls. We are seeing
>> timeouts when GuC channel is backed up and it is unable to respond
>> in a timely fashion causing warnings and affecting CI.
>
> Is it the "Unable to force min freq" error? Do you have a link to the 
> GitLab issue to add to commit message?
We don't have a specific error for this one, but have seen similar 
issues with other H2G which are blocking.
>
>> This is seen when waitboosting happens during a stress test.
>> this patch updates the waitboost path to use a non-blocking
>> H2G call instead, which returns as soon as the message is
>> successfully transmitted.
>
> AFAIU with this approach, when CT channel is congested, you instead 
> achieve silent dropping of the waitboost request, right?
We are hoping it makes it, but just not waiting for it to complete.
>
> It sounds like a potentially important feedback from the field to lose 
> so easily. How about you added drm_notice to the worker when it fails?
>
> Or simply a "one line patch" to replace i915_probe_error (!?) with 
> drm_notice and keep the blocking behavior. (I have no idea what is the 
> typical time to drain the CT buffer, and so to decide whether waiting 
> or dropping makes more sense for effectiveness of waitboosting.)
>
> Or since the congestion /should not/ happen in production, then the 
> argument is why complicate with more code, in which case going with 
> one line patch is an easy way forward?

Even if we soften the blow here, the actual timeout error occurs in the 
intel_guc_ct.c code, so we cannot hide that error anyways. Making this 
call non-blocking will achieve both things.

Thanks,

Vinay.

>
> Regards,
>
> Tvrtko
>
>> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar at intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c | 38 ++++++++++++++++-----
>>   1 file changed, 30 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c 
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>> index 1db833da42df..c852f73cf521 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>> @@ -98,6 +98,30 @@ static u32 slpc_get_state(struct intel_guc_slpc 
>> *slpc)
>>       return data->header.global_state;
>>   }
>>   +static int guc_action_slpc_set_param_nb(struct intel_guc *guc, u8 
>> id, u32 value)
>> +{
>> +    u32 request[] = {
>> +        GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST,
>> +        SLPC_EVENT(SLPC_EVENT_PARAMETER_SET, 2),
>> +        id,
>> +        value,
>> +    };
>> +    int ret;
>> +
>> +    ret = intel_guc_send_nb(guc, request, ARRAY_SIZE(request), 0);
>> +
>> +    return ret > 0 ? -EPROTO : ret;
>> +}
>> +
>> +static int slpc_set_param_nb(struct intel_guc_slpc *slpc, u8 id, u32 
>> value)
>> +{
>> +    struct intel_guc *guc = slpc_to_guc(slpc);
>> +
>> +    GEM_BUG_ON(id >= SLPC_MAX_PARAM);
>> +
>> +    return guc_action_slpc_set_param_nb(guc, id, value);
>> +}
>> +
>>   static int guc_action_slpc_set_param(struct intel_guc *guc, u8 id, 
>> u32 value)
>>   {
>>       u32 request[] = {
>> @@ -208,12 +232,10 @@ static int slpc_force_min_freq(struct 
>> intel_guc_slpc *slpc, u32 freq)
>>        */
>>         with_intel_runtime_pm(&i915->runtime_pm, wakeref) {
>> -        ret = slpc_set_param(slpc,
>> -                     SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
>> -                     freq);
>> -        if (ret)
>> -            i915_probe_error(i915, "Unable to force min freq to %u: 
>> %d",
>> -                     freq, ret);
>> +        /* Non-blocking request will avoid stalls */
>> +        ret = slpc_set_param_nb(slpc,
>> +                    SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
>> +                    freq);
>>       }
>>         return ret;
>> @@ -231,8 +253,8 @@ static void slpc_boost_work(struct work_struct 
>> *work)
>>        */
>>       mutex_lock(&slpc->lock);
>>       if (atomic_read(&slpc->num_waiters)) {
>> -        slpc_force_min_freq(slpc, slpc->boost_freq);
>> -        slpc->num_boosts++;
>> +        if (!slpc_force_min_freq(slpc, slpc->boost_freq))
>> +            slpc->num_boosts++;
>>       }
>>       mutex_unlock(&slpc->lock);
>>   }