[PATCH] drm/i915/guc: Add Compute context hint

Thu Feb 22 23:31:44 UTC 2024

On 2/22/2024 7:32 AM, Tvrtko Ursulin wrote:
>
> On 21/02/2024 21:28, Rodrigo Vivi wrote:
>> On Wed, Feb 21, 2024 at 09:42:34AM +0000, Tvrtko Ursulin wrote:
>>>
>>> On 21/02/2024 00:14, Vinay Belgaumkar wrote:
>>>> Allow user to provide a context hint. When this is set, KMD will
>>>> send a hint to GuC which results in special handling for this
>>>> context. SLPC will ramp the GT frequency aggressively every time
>>>> it switches to this context. The down freq threshold will also be
>>>> lower so GuC will ramp down the GT freq for this context more slowly.
>>>> We also disable waitboost for this context as that will interfere with
>>>> the strategy.
>>>>
>>>> We need to enable the use of Compute strategy during SLPC init, but
>>>> it will apply only to contexts that set this bit during context
>>>> creation.
>>>>
>>>> Userland can check whether this feature is supported using a new 
>>>> param-
>>>> I915_PARAM_HAS_COMPUTE_CONTEXT. This flag is true for all guc 
>>>> submission
>>>> enabled platforms since they use SLPC for freq management.
>>>>
>>>> The Mesa usage model for this flag is here -
>>>> https://gitlab.freedesktop.org/sushmave/mesa/-/commits/compute_hint
>>>
>>> This allows for setting it for the whole application, correct? Upsides,
>>> downsides? Are there any plans for per context?
>>
>> Currently there's no extension on a high level API 
>> (Vulkan/OpenGL/OpenCL/etc)
>> that would allow the application to hint for power/freq/latency. So 
>> Mesa cannot
>> decide when to hint. So their solution was to use .drirc and make 
>> per-application
>> decision.
>>
>> I would prefer a high level extension for a more granular and 
>> informative
>> decision. We need to work with that goal, but for now I don't see any
>> cons on this approach.
>
> In principle yeah I doesn't harm to have the option. I am just not 
> sure how useful this intermediate step this is with its lack of 
> intra-process granularity.
>
>>>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar at intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/gem/i915_gem_context.c   |  8 +++++++
>>>>    .../gpu/drm/i915/gem/i915_gem_context_types.h |  1 +
>>>>    drivers/gpu/drm/i915/gt/intel_rps.c           |  8 +++++++
>>>>    .../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h | 21 
>>>> +++++++++++++++++++
>>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   | 17 +++++++++++++++
>>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h   |  1 +
>>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  7 +++++++
>>>>    drivers/gpu/drm/i915/i915_getparam.c          | 11 ++++++++++
>>>>    include/uapi/drm/i915_drm.h                   | 15 +++++++++++++
>>>>    9 files changed, 89 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
>>>> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>>> index dcbfe32fd30c..ceab7dbe9b47 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>>> @@ -879,6 +879,7 @@ static int set_proto_ctx_param(struct 
>>>> drm_i915_file_private *fpriv,
>>>>                       struct i915_gem_proto_context *pc,
>>>>                       struct drm_i915_gem_context_param *args)
>>>>    {
>>>> +    struct drm_i915_private *i915 = fpriv->i915;
>>>>        int ret = 0;
>>>>        switch (args->param) {
>>>> @@ -904,6 +905,13 @@ static int set_proto_ctx_param(struct 
>>>> drm_i915_file_private *fpriv,
>>>>                pc->user_flags &= ~BIT(UCONTEXT_BANNABLE);
>>>>            break;
>>>> +    case I915_CONTEXT_PARAM_IS_COMPUTE:
>>>> +        if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
>>>> +            ret = -EINVAL;
>>>> +        else
>>>> +            pc->user_flags |= BIT(UCONTEXT_COMPUTE);
>>>> +        break;
>>>> +
>>>>        case I915_CONTEXT_PARAM_RECOVERABLE:
>>>>            if (args->size)
>>>>                ret = -EINVAL;
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h 
>>>> b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
>>>> index 03bc7f9d191b..db86d6f6245f 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
>>>> @@ -338,6 +338,7 @@ struct i915_gem_context {
>>>>    #define UCONTEXT_BANNABLE        2
>>>>    #define UCONTEXT_RECOVERABLE        3
>>>>    #define UCONTEXT_PERSISTENCE        4
>>>> +#define UCONTEXT_COMPUTE        5
>>>
>>> What is the GuC behaviour when SLPC_CTX_FREQ_REQ_IS_COMPUTE is set for
>>> non-compute engines? Wondering if per intel_context is what we want 
>>> instead.
>>> (Which could then be the i915_context_param_engines extension to mark
>>> individual contexts as compute strategy.)
>>
>> Perhaps we should rename this? This is a freq-decision-strategy inside
>> GuC that is there mostly targeting compute workloads that needs lower
>> latency with short burst execution. But the engine itself doesn't 
>> matter.
>> It can be applied to any engine.
>
> I have no idea if it makes sense for other engines, such as video, and 
> what would be pros and cons in terms of PnP. But in the case we end up 
> allowing it on any engine, then at least userspace name shouldn't be 
> compute. :)
Yes, one of the suggestions from Daniele was to have something along the 
lines of UCONTEXT_HIFREQ or something along those lines so we don't 
confuse it with the Compute Engine.
>
> Or if we decide to call it compute and only apply to compute engines, 
> then I would strongly suggest making the uapi per intel_context i.e. 
> the set engines extension instead of the GEM context param. Otherwise 
> it would be odd that some engines get it and some don't. With explicit 
> configuration userspace gets to see the clear picture of what is what.

It will not be per engine, so may be better to keep it at the 
gem_context level. Trying to percolate it to the intel_context level 
seems to be more complicated. We process the gem_context_param flags 
first and then create the intel_context per engine. Not sure if we want 
to keep 2 copies of the same flag in gem_context and intel_context as well.

>
> Regards,
>
> Tvrtko
>
>>>>        /**
>>>>         * @flags: small set of booleans
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c 
>>>> b/drivers/gpu/drm/i915/gt/intel_rps.c
>>>> index 4feef874e6d6..1ed40cd61b70 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
>>>> @@ -24,6 +24,7 @@
>>>>    #include "intel_pcode.h"
>>>>    #include "intel_rps.h"
>>>>    #include "vlv_sideband.h"
>>>> +#include "../gem/i915_gem_context.h"
>>>>    #include "../../../platform/x86/intel_ips.h"
>>>>    #define BUSY_MAX_EI    20u /* ms */
>>>> @@ -1018,6 +1019,13 @@ void intel_rps_boost(struct i915_request *rq)
>>>>            struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>>>>            if (rps_uses_slpc(rps)) {
>>>> +            const struct i915_gem_context *ctx;
>>>> +
>>>> +            ctx = i915_request_gem_context(rq);
>>>> +            if (ctx &&
>>>> +                test_bit(UCONTEXT_COMPUTE, &ctx->user_flags))
>>>> +                return;
>>>> +
>>>
>>> I think request and intel_context do not own a strong reference to GEM
>>> context. So at minimum you need a local one obtained under a RCU 
>>> lock with
>>> kref_get_unless_zero, as do some other places do.
>>>
>>> However.. it may be simpler to just store the flag in 
>>> intel_context->flags.
>>> If you carry it over at the time GEM context is assigned to 
>>> intel_context,
>>> not only you simplify runtime rules, but you get the ability to not 
>>> set the
>>> compute flags for video etc.
>>
>> +1 on the intel_context->flags
>>
>>>
>>> It may even make sense to add a "don't waitboost" flag on top of the 
>>> "is
>>> compute" so this call site becomes self-documenting (otherwise I ask 
>>> to add
>>> a comment here please). Then you could even move it out from the SLPC
>>> special case.
>>
>> +1 on the dont_waitboost flag as well. might be worth for other cases
>> like display metrics for instance.

We could define another disable_waitboost flag in intel_context, but 
seems redundant if we already have this info in the gem_context. We 
don't need to check for SLPC special case, just need to check this flag 
as we won't enable it for the non-slpc case anyways.

Thanks,

Vinay.

>>
>>>
>>>>                slpc = rps_to_slpc(rps);
>>>>                if (slpc->min_freq_softlimit >= slpc->boost_freq)
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h 
>>>> b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
>>>> index 811add10c30d..c34674e797c6 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
>>>> @@ -207,6 +207,27 @@ struct slpc_shared_data {
>>>>        u8 reserved_mode_definition[4096];
>>>>    } __packed;
>>>> +struct slpc_context_frequency_request {
>>>> +    u32 frequency_request:16;
>>>> +    u32 reserved:12;
>>>> +    u32 is_compute:1;
>>>> +    u32 ignore_busyness:1;
>>>> +    u32 is_minimum:1;
>>>> +    u32 is_predefined:1;
>>>> +} __packed;
>>>> +
>>>> +#define SLPC_CTX_FREQ_REQ_IS_COMPUTE        REG_BIT(28)
>>>> +
>>>> +struct slpc_optimized_strategies {
>>>> +    u32 compute:1;
>>>> +    u32 async_flip:1;
>>>> +    u32 media:1;
>>>> +    u32 vsync_flip:1;
>>>> +    u32 reserved:28;
>>>> +} __packed;
>>>> +
>>>> +#define SLPC_OPTIMIZED_STRATEGY_COMPUTE        REG_BIT(0)
>>>> +
>>>>    /**
>>>>     * DOC: SLPC H2G MESSAGE FORMAT
>>>>     *
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c 
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>>>> index 3e681ab6fbf9..706fffca698b 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>>>> @@ -537,6 +537,20 @@ int intel_guc_slpc_get_min_freq(struct 
>>>> intel_guc_slpc *slpc, u32 *val)
>>>>        return ret;
>>>>    }
>>>> +int intel_guc_slpc_set_strategy(struct intel_guc_slpc *slpc, u32 val)
>>>> +{
>>>> +    struct drm_i915_private *i915 = slpc_to_i915(slpc);
>>>> +    intel_wakeref_t wakeref;
>>>> +    int ret = 0;
>>>> +
>>>> +    with_intel_runtime_pm(&i915->runtime_pm, wakeref)
>>>> +        ret = slpc_set_param(slpc,
>>>> +                     SLPC_PARAM_STRATEGIES,
>>>> +                     val);
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>>    int intel_guc_slpc_set_media_ratio_mode(struct intel_guc_slpc 
>>>> *slpc, u32 val)
>>>>    {
>>>>        struct drm_i915_private *i915 = slpc_to_i915(slpc);
>>>> @@ -711,6 +725,9 @@ int intel_guc_slpc_enable(struct intel_guc_slpc 
>>>> *slpc)
>>>>        /* Set cached media freq ratio mode */
>>>>        intel_guc_slpc_set_media_ratio_mode(slpc, 
>>>> slpc->media_ratio_mode);
>>>> +    /* Enable SLPC Optimized Strategy for compute */
>>>> +    intel_guc_slpc_set_strategy(slpc, 
>>>> SLPC_OPTIMIZED_STRATEGY_COMPUTE);
>>>> +
>>>>        return 0;
>>>>    }
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h 
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
>>>> index 6ac6503c39d4..1cb5fd44f05c 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
>>>> @@ -45,5 +45,6 @@ void intel_guc_pm_intrmsk_enable(struct intel_gt 
>>>> *gt);
>>>>    void intel_guc_slpc_boost(struct intel_guc_slpc *slpc);
>>>>    void intel_guc_slpc_dec_waiters(struct intel_guc_slpc *slpc);
>>>>    int intel_guc_slpc_set_ignore_eff_freq(struct intel_guc_slpc 
>>>> *slpc, bool val);
>>>> +int intel_guc_slpc_set_strategy(struct intel_guc_slpc *slpc, u32 
>>>> val);
>>>>    #endif
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> index f3dcae4b9d45..bbabfa5532e5 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> @@ -2645,6 +2645,7 @@ MAKE_CONTEXT_POLICY_ADD(execution_quantum, 
>>>> EXECUTION_QUANTUM)
>>>>    MAKE_CONTEXT_POLICY_ADD(preemption_timeout, PREEMPTION_TIMEOUT)
>>>>    MAKE_CONTEXT_POLICY_ADD(priority, SCHEDULING_PRIORITY)
>>>>    MAKE_CONTEXT_POLICY_ADD(preempt_to_idle, 
>>>> PREEMPT_TO_IDLE_ON_QUANTUM_EXPIRY)
>>>> +MAKE_CONTEXT_POLICY_ADD(slpc_ctx_freq_req, SLPM_GT_FREQUENCY)
>>>>    #undef MAKE_CONTEXT_POLICY_ADD
>>>> @@ -2662,8 +2663,10 @@ static int 
>>>> guc_context_policy_init_v70(struct intel_context *ce, bool loop)
>>>>        struct intel_engine_cs *engine = ce->engine;
>>>>        struct intel_guc *guc = &engine->gt->uc.guc;
>>>>        struct context_policy policy;
>>>> +    struct i915_gem_context *ctx = rcu_dereference(ce->gem_context);
>>>>        u32 execution_quantum;
>>>>        u32 preemption_timeout;
>>>> +    u32 slpc_ctx_freq_req = 0;
>>>>        unsigned long flags;
>>>>        int ret;
>>>> @@ -2675,11 +2678,15 @@ static int 
>>>> guc_context_policy_init_v70(struct intel_context *ce, bool loop)
>>>>        execution_quantum = engine->props.timeslice_duration_ms * 1000;
>>>>        preemption_timeout = engine->props.preempt_timeout_ms * 1000;
>>>> +    if (ctx && (ctx->user_flags & BIT(UCONTEXT_COMPUTE)))
>>>> +        slpc_ctx_freq_req |= SLPC_CTX_FREQ_REQ_IS_COMPUTE;
>>>> +
>>>>        __guc_context_policy_start_klv(&policy, ce->guc_id.id);
>>>>        __guc_context_policy_add_priority(&policy, ce->guc_state.prio);
>>>> __guc_context_policy_add_execution_quantum(&policy, 
>>>> execution_quantum);
>>>> __guc_context_policy_add_preemption_timeout(&policy, 
>>>> preemption_timeout);
>>>> +    __guc_context_policy_add_slpc_ctx_freq_req(&policy, 
>>>> slpc_ctx_freq_req);
>>>>        if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)
>>>> __guc_context_policy_add_preempt_to_idle(&policy, 1);
>>>> diff --git a/drivers/gpu/drm/i915/i915_getparam.c 
>>>> b/drivers/gpu/drm/i915/i915_getparam.c
>>>> index 5c3fec63cb4c..0f12e36b2a12 100644
>>>> --- a/drivers/gpu/drm/i915/i915_getparam.c
>>>> +++ b/drivers/gpu/drm/i915/i915_getparam.c
>>>> @@ -155,6 +155,17 @@ int i915_getparam_ioctl(struct drm_device 
>>>> *dev, void *data,
>>>>             */
>>>>            value = 1;
>>>>            break;
>>>> +    case I915_PARAM_HAS_COMPUTE_CONTEXT:
>>>> +        /* This feature has been available in GuC for a while but
>>>> +         * a use case now required the use of this feature. We
>>>> +         * return true now since this is now being supported from
>>>> +         * the kernel side as well.
>>>> +         */
>>>
>>> Nit - stick to the multi-line comment style i915 uses please.
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> +        if (intel_uc_uses_guc_submission(&to_gt(i915)->uc))
>>>> +            value = 1;
>>>> +        else
>>>> +            value = -EINVAL;
>>>> +        break;
>>>>        case I915_PARAM_HAS_CONTEXT_ISOLATION:
>>>>            value = intel_engines_has_context_isolation(i915);
>>>>            break;
>>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>>> index 2ee338860b7e..1bd12f536108 100644
>>>> --- a/include/uapi/drm/i915_drm.h
>>>> +++ b/include/uapi/drm/i915_drm.h
>>>> @@ -806,6 +806,12 @@ typedef struct drm_i915_irq_wait {
>>>>     */
>>>>    #define I915_PARAM_PXP_STATUS         58
>>>> +/*
>>>> + * Query if kernel allows marking a context as a Compute context. 
>>>> This will
>>>> + * result in more aggressive GT frequency ramping for this context.
>>>> + */
>>>> +#define I915_PARAM_HAS_COMPUTE_CONTEXT 59
>>>> +
>>>>    /* Must be kept compact -- no holes and well documented */
>>>>    /**
>>>> @@ -2148,6 +2154,15 @@ struct drm_i915_gem_context_param {
>>>>     * -EIO: The firmware did not succeed in creating the protected 
>>>> context.
>>>>     */
>>>>    #define I915_CONTEXT_PARAM_PROTECTED_CONTENT    0xd
>>>> +
>>>> +/*
>>>> + * I915_CONTEXT_PARAM_IS_COMPUTE:
>>>> + *
>>>> + * Mark this context as a Compute related workload which requires 
>>>> aggressive GT
>>>> + * frequency scaling. Query I915_PARAM_HAS_CONTEXT_COMPUTE to 
>>>> check if the kernel
>>>> + * supports this functionality.
>>>> + */
>>>> +#define I915_CONTEXT_PARAM_IS_COMPUTE        0xe
>>>>    /* Must be kept compact -- no holes and well documented */
>>>>        /** @value: Context parameter value to be set or queried */