[PATCH] drm/i915/guc: Add Compute context hint

Fri Feb 23 08:51:12 UTC 2024

On 22/02/2024 23:31, Belgaumkar, Vinay wrote:
> 
> On 2/22/2024 7:32 AM, Tvrtko Ursulin wrote:
>>
>> On 21/02/2024 21:28, Rodrigo Vivi wrote:
>>> On Wed, Feb 21, 2024 at 09:42:34AM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 21/02/2024 00:14, Vinay Belgaumkar wrote:
>>>>> Allow user to provide a context hint. When this is set, KMD will
>>>>> send a hint to GuC which results in special handling for this
>>>>> context. SLPC will ramp the GT frequency aggressively every time
>>>>> it switches to this context. The down freq threshold will also be
>>>>> lower so GuC will ramp down the GT freq for this context more slowly.
>>>>> We also disable waitboost for this context as that will interfere with
>>>>> the strategy.
>>>>>
>>>>> We need to enable the use of Compute strategy during SLPC init, but
>>>>> it will apply only to contexts that set this bit during context
>>>>> creation.
>>>>>
>>>>> Userland can check whether this feature is supported using a new 
>>>>> param-
>>>>> I915_PARAM_HAS_COMPUTE_CONTEXT. This flag is true for all guc 
>>>>> submission
>>>>> enabled platforms since they use SLPC for freq management.
>>>>>
>>>>> The Mesa usage model for this flag is here -
>>>>> https://gitlab.freedesktop.org/sushmave/mesa/-/commits/compute_hint
>>>>
>>>> This allows for setting it for the whole application, correct? Upsides,
>>>> downsides? Are there any plans for per context?
>>>
>>> Currently there's no extension on a high level API 
>>> (Vulkan/OpenGL/OpenCL/etc)
>>> that would allow the application to hint for power/freq/latency. So 
>>> Mesa cannot
>>> decide when to hint. So their solution was to use .drirc and make 
>>> per-application
>>> decision.
>>>
>>> I would prefer a high level extension for a more granular and 
>>> informative
>>> decision. We need to work with that goal, but for now I don't see any
>>> cons on this approach.
>>
>> In principle yeah I doesn't harm to have the option. I am just not 
>> sure how useful this intermediate step this is with its lack of 
>> intra-process granularity.
>>
>>>>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>>> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar at intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/gem/i915_gem_context.c   |  8 +++++++
>>>>>    .../gpu/drm/i915/gem/i915_gem_context_types.h |  1 +
>>>>>    drivers/gpu/drm/i915/gt/intel_rps.c           |  8 +++++++
>>>>>    .../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h | 21 
>>>>> +++++++++++++++++++
>>>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   | 17 +++++++++++++++
>>>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h   |  1 +
>>>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  7 +++++++
>>>>>    drivers/gpu/drm/i915/i915_getparam.c          | 11 ++++++++++
>>>>>    include/uapi/drm/i915_drm.h                   | 15 +++++++++++++
>>>>>    9 files changed, 89 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>>>> index dcbfe32fd30c..ceab7dbe9b47 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>>>> @@ -879,6 +879,7 @@ static int set_proto_ctx_param(struct 
>>>>> drm_i915_file_private *fpriv,
>>>>>                       struct i915_gem_proto_context *pc,
>>>>>                       struct drm_i915_gem_context_param *args)
>>>>>    {
>>>>> +    struct drm_i915_private *i915 = fpriv->i915;
>>>>>        int ret = 0;
>>>>>        switch (args->param) {
>>>>> @@ -904,6 +905,13 @@ static int set_proto_ctx_param(struct 
>>>>> drm_i915_file_private *fpriv,
>>>>>                pc->user_flags &= ~BIT(UCONTEXT_BANNABLE);
>>>>>            break;
>>>>> +    case I915_CONTEXT_PARAM_IS_COMPUTE:
>>>>> +        if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
>>>>> +            ret = -EINVAL;
>>>>> +        else
>>>>> +            pc->user_flags |= BIT(UCONTEXT_COMPUTE);
>>>>> +        break;
>>>>> +
>>>>>        case I915_CONTEXT_PARAM_RECOVERABLE:
>>>>>            if (args->size)
>>>>>                ret = -EINVAL;
>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h 
>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
>>>>> index 03bc7f9d191b..db86d6f6245f 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
>>>>> @@ -338,6 +338,7 @@ struct i915_gem_context {
>>>>>    #define UCONTEXT_BANNABLE        2
>>>>>    #define UCONTEXT_RECOVERABLE        3
>>>>>    #define UCONTEXT_PERSISTENCE        4
>>>>> +#define UCONTEXT_COMPUTE        5
>>>>
>>>> What is the GuC behaviour when SLPC_CTX_FREQ_REQ_IS_COMPUTE is set for
>>>> non-compute engines? Wondering if per intel_context is what we want 
>>>> instead.
>>>> (Which could then be the i915_context_param_engines extension to mark
>>>> individual contexts as compute strategy.)
>>>
>>> Perhaps we should rename this? This is a freq-decision-strategy inside
>>> GuC that is there mostly targeting compute workloads that needs lower
>>> latency with short burst execution. But the engine itself doesn't 
>>> matter.
>>> It can be applied to any engine.
>>
>> I have no idea if it makes sense for other engines, such as video, and 
>> what would be pros and cons in terms of PnP. But in the case we end up 
>> allowing it on any engine, then at least userspace name shouldn't be 
>> compute. :)
> Yes, one of the suggestions from Daniele was to have something along the 
> lines of UCONTEXT_HIFREQ or something along those lines so we don't 
> confuse it with the Compute Engine.

Okay, but additional question is would anyone ever want to set it for 
video or blitter. Would it harm, benefir, or be neutral.

>> Or if we decide to call it compute and only apply to compute engines, 
>> then I would strongly suggest making the uapi per intel_context i.e. 
>> the set engines extension instead of the GEM context param. Otherwise 
>> it would be odd that some engines get it and some don't. With explicit 
>> configuration userspace gets to see the clear picture of what is what.
> 
> It will not be per engine, so may be better to keep it at the 

Why? Just because..

> gem_context level. Trying to percolate it to the intel_context level 
> seems to be more complicated. We process the gem_context_param flags 
> first and then create the intel_context per engine. Not sure if we want 
> to keep 2 copies of the same flag in gem_context and intel_context as well.

.. it is complicated? It is not complicated at all. There is 
intel_context_set_gem which is just for uses like that.

Once you have this, then the only difference is whether you go from GEM 
context setparam to intel_context, or flag goes directly to 
intel_context as they are created via custom engine maps.

Regards,

Tvrtko

>>
>> Regards,
>>
>> Tvrtko
>>
>>>>>        /**
>>>>>         * @flags: small set of booleans
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c 
>>>>> b/drivers/gpu/drm/i915/gt/intel_rps.c
>>>>> index 4feef874e6d6..1ed40cd61b70 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
>>>>> @@ -24,6 +24,7 @@
>>>>>    #include "intel_pcode.h"
>>>>>    #include "intel_rps.h"
>>>>>    #include "vlv_sideband.h"
>>>>> +#include "../gem/i915_gem_context.h"
>>>>>    #include "../../../platform/x86/intel_ips.h"
>>>>>    #define BUSY_MAX_EI    20u /* ms */
>>>>> @@ -1018,6 +1019,13 @@ void intel_rps_boost(struct i915_request *rq)
>>>>>            struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>>>>>            if (rps_uses_slpc(rps)) {
>>>>> +            const struct i915_gem_context *ctx;
>>>>> +
>>>>> +            ctx = i915_request_gem_context(rq);
>>>>> +            if (ctx &&
>>>>> +                test_bit(UCONTEXT_COMPUTE, &ctx->user_flags))
>>>>> +                return;
>>>>> +
>>>>
>>>> I think request and intel_context do not own a strong reference to GEM
>>>> context. So at minimum you need a local one obtained under a RCU 
>>>> lock with
>>>> kref_get_unless_zero, as do some other places do.
>>>>
>>>> However.. it may be simpler to just store the flag in 
>>>> intel_context->flags.
>>>> If you carry it over at the time GEM context is assigned to 
>>>> intel_context,
>>>> not only you simplify runtime rules, but you get the ability to not 
>>>> set the
>>>> compute flags for video etc.
>>>
>>> +1 on the intel_context->flags
>>>
>>>>
>>>> It may even make sense to add a "don't waitboost" flag on top of the 
>>>> "is
>>>> compute" so this call site becomes self-documenting (otherwise I ask 
>>>> to add
>>>> a comment here please). Then you could even move it out from the SLPC
>>>> special case.
>>>
>>> +1 on the dont_waitboost flag as well. might be worth for other cases
>>> like display metrics for instance.
> 
> We could define another disable_waitboost flag in intel_context, but 
> seems redundant if we already have this info in the gem_context. We 
> don't need to check for SLPC special case, just need to check this flag 
> as we won't enable it for the non-slpc case anyways.
> 
> Thanks,
> 
> Vinay.
> 
>>>
>>>>
>>>>>                slpc = rps_to_slpc(rps);
>>>>>                if (slpc->min_freq_softlimit >= slpc->boost_freq)
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h 
>>>>> b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
>>>>> index 811add10c30d..c34674e797c6 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
>>>>> @@ -207,6 +207,27 @@ struct slpc_shared_data {
>>>>>        u8 reserved_mode_definition[4096];
>>>>>    } __packed;
>>>>> +struct slpc_context_frequency_request {
>>>>> +    u32 frequency_request:16;
>>>>> +    u32 reserved:12;
>>>>> +    u32 is_compute:1;
>>>>> +    u32 ignore_busyness:1;
>>>>> +    u32 is_minimum:1;
>>>>> +    u32 is_predefined:1;
>>>>> +} __packed;
>>>>> +
>>>>> +#define SLPC_CTX_FREQ_REQ_IS_COMPUTE        REG_BIT(28)
>>>>> +
>>>>> +struct slpc_optimized_strategies {
>>>>> +    u32 compute:1;
>>>>> +    u32 async_flip:1;
>>>>> +    u32 media:1;
>>>>> +    u32 vsync_flip:1;
>>>>> +    u32 reserved:28;
>>>>> +} __packed;
>>>>> +
>>>>> +#define SLPC_OPTIMIZED_STRATEGY_COMPUTE        REG_BIT(0)
>>>>> +
>>>>>    /**
>>>>>     * DOC: SLPC H2G MESSAGE FORMAT
>>>>>     *
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c 
>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>>>>> index 3e681ab6fbf9..706fffca698b 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
>>>>> @@ -537,6 +537,20 @@ int intel_guc_slpc_get_min_freq(struct 
>>>>> intel_guc_slpc *slpc, u32 *val)
>>>>>        return ret;
>>>>>    }
>>>>> +int intel_guc_slpc_set_strategy(struct intel_guc_slpc *slpc, u32 val)
>>>>> +{
>>>>> +    struct drm_i915_private *i915 = slpc_to_i915(slpc);
>>>>> +    intel_wakeref_t wakeref;
>>>>> +    int ret = 0;
>>>>> +
>>>>> +    with_intel_runtime_pm(&i915->runtime_pm, wakeref)
>>>>> +        ret = slpc_set_param(slpc,
>>>>> +                     SLPC_PARAM_STRATEGIES,
>>>>> +                     val);
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>>    int intel_guc_slpc_set_media_ratio_mode(struct intel_guc_slpc 
>>>>> *slpc, u32 val)
>>>>>    {
>>>>>        struct drm_i915_private *i915 = slpc_to_i915(slpc);
>>>>> @@ -711,6 +725,9 @@ int intel_guc_slpc_enable(struct intel_guc_slpc 
>>>>> *slpc)
>>>>>        /* Set cached media freq ratio mode */
>>>>>        intel_guc_slpc_set_media_ratio_mode(slpc, 
>>>>> slpc->media_ratio_mode);
>>>>> +    /* Enable SLPC Optimized Strategy for compute */
>>>>> +    intel_guc_slpc_set_strategy(slpc, 
>>>>> SLPC_OPTIMIZED_STRATEGY_COMPUTE);
>>>>> +
>>>>>        return 0;
>>>>>    }
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h 
>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
>>>>> index 6ac6503c39d4..1cb5fd44f05c 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
>>>>> @@ -45,5 +45,6 @@ void intel_guc_pm_intrmsk_enable(struct intel_gt 
>>>>> *gt);
>>>>>    void intel_guc_slpc_boost(struct intel_guc_slpc *slpc);
>>>>>    void intel_guc_slpc_dec_waiters(struct intel_guc_slpc *slpc);
>>>>>    int intel_guc_slpc_set_ignore_eff_freq(struct intel_guc_slpc 
>>>>> *slpc, bool val);
>>>>> +int intel_guc_slpc_set_strategy(struct intel_guc_slpc *slpc, u32 
>>>>> val);
>>>>>    #endif
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>> index f3dcae4b9d45..bbabfa5532e5 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>> @@ -2645,6 +2645,7 @@ MAKE_CONTEXT_POLICY_ADD(execution_quantum, 
>>>>> EXECUTION_QUANTUM)
>>>>>    MAKE_CONTEXT_POLICY_ADD(preemption_timeout, PREEMPTION_TIMEOUT)
>>>>>    MAKE_CONTEXT_POLICY_ADD(priority, SCHEDULING_PRIORITY)
>>>>>    MAKE_CONTEXT_POLICY_ADD(preempt_to_idle, 
>>>>> PREEMPT_TO_IDLE_ON_QUANTUM_EXPIRY)
>>>>> +MAKE_CONTEXT_POLICY_ADD(slpc_ctx_freq_req, SLPM_GT_FREQUENCY)
>>>>>    #undef MAKE_CONTEXT_POLICY_ADD
>>>>> @@ -2662,8 +2663,10 @@ static int 
>>>>> guc_context_policy_init_v70(struct intel_context *ce, bool loop)
>>>>>        struct intel_engine_cs *engine = ce->engine;
>>>>>        struct intel_guc *guc = &engine->gt->uc.guc;
>>>>>        struct context_policy policy;
>>>>> +    struct i915_gem_context *ctx = rcu_dereference(ce->gem_context);
>>>>>        u32 execution_quantum;
>>>>>        u32 preemption_timeout;
>>>>> +    u32 slpc_ctx_freq_req = 0;
>>>>>        unsigned long flags;
>>>>>        int ret;
>>>>> @@ -2675,11 +2678,15 @@ static int 
>>>>> guc_context_policy_init_v70(struct intel_context *ce, bool loop)
>>>>>        execution_quantum = engine->props.timeslice_duration_ms * 1000;
>>>>>        preemption_timeout = engine->props.preempt_timeout_ms * 1000;
>>>>> +    if (ctx && (ctx->user_flags & BIT(UCONTEXT_COMPUTE)))
>>>>> +        slpc_ctx_freq_req |= SLPC_CTX_FREQ_REQ_IS_COMPUTE;
>>>>> +
>>>>>        __guc_context_policy_start_klv(&policy, ce->guc_id.id);
>>>>>        __guc_context_policy_add_priority(&policy, ce->guc_state.prio);
>>>>> __guc_context_policy_add_execution_quantum(&policy, 
>>>>> execution_quantum);
>>>>> __guc_context_policy_add_preemption_timeout(&policy, 
>>>>> preemption_timeout);
>>>>> +    __guc_context_policy_add_slpc_ctx_freq_req(&policy, 
>>>>> slpc_ctx_freq_req);
>>>>>        if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)
>>>>> __guc_context_policy_add_preempt_to_idle(&policy, 1);
>>>>> diff --git a/drivers/gpu/drm/i915/i915_getparam.c 
>>>>> b/drivers/gpu/drm/i915/i915_getparam.c
>>>>> index 5c3fec63cb4c..0f12e36b2a12 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_getparam.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_getparam.c
>>>>> @@ -155,6 +155,17 @@ int i915_getparam_ioctl(struct drm_device 
>>>>> *dev, void *data,
>>>>>             */
>>>>>            value = 1;
>>>>>            break;
>>>>> +    case I915_PARAM_HAS_COMPUTE_CONTEXT:
>>>>> +        /* This feature has been available in GuC for a while but
>>>>> +         * a use case now required the use of this feature. We
>>>>> +         * return true now since this is now being supported from
>>>>> +         * the kernel side as well.
>>>>> +         */
>>>>
>>>> Nit - stick to the multi-line comment style i915 uses please.
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>>
>>>>> +        if (intel_uc_uses_guc_submission(&to_gt(i915)->uc))
>>>>> +            value = 1;
>>>>> +        else
>>>>> +            value = -EINVAL;
>>>>> +        break;
>>>>>        case I915_PARAM_HAS_CONTEXT_ISOLATION:
>>>>>            value = intel_engines_has_context_isolation(i915);
>>>>>            break;
>>>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>>>> index 2ee338860b7e..1bd12f536108 100644
>>>>> --- a/include/uapi/drm/i915_drm.h
>>>>> +++ b/include/uapi/drm/i915_drm.h
>>>>> @@ -806,6 +806,12 @@ typedef struct drm_i915_irq_wait {
>>>>>     */
>>>>>    #define I915_PARAM_PXP_STATUS         58
>>>>> +/*
>>>>> + * Query if kernel allows marking a context as a Compute context. 
>>>>> This will
>>>>> + * result in more aggressive GT frequency ramping for this context.
>>>>> + */
>>>>> +#define I915_PARAM_HAS_COMPUTE_CONTEXT 59
>>>>> +
>>>>>    /* Must be kept compact -- no holes and well documented */
>>>>>    /**
>>>>> @@ -2148,6 +2154,15 @@ struct drm_i915_gem_context_param {
>>>>>     * -EIO: The firmware did not succeed in creating the protected 
>>>>> context.
>>>>>     */
>>>>>    #define I915_CONTEXT_PARAM_PROTECTED_CONTENT    0xd
>>>>> +
>>>>> +/*
>>>>> + * I915_CONTEXT_PARAM_IS_COMPUTE:
>>>>> + *
>>>>> + * Mark this context as a Compute related workload which requires 
>>>>> aggressive GT
>>>>> + * frequency scaling. Query I915_PARAM_HAS_CONTEXT_COMPUTE to 
>>>>> check if the kernel
>>>>> + * supports this functionality.
>>>>> + */
>>>>> +#define I915_CONTEXT_PARAM_IS_COMPUTE        0xe
>>>>>    /* Must be kept compact -- no holes and well documented */
>>>>>        /** @value: Context parameter value to be set or queried */