[Intel-gfx] [PATCH 5/8] drm/i915/perf: lock powergating configuration to default when active

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Wed Aug 15 11:58:05 UTC 2018


On 14/08/2018 15:59, Lionel Landwerlin wrote:
> Hey Tvrtko,
> 
> Thanks for taking over this series.
> 
> I've been talking to developers using the i915/perf interface and from 
> their point of view, they expect the system to be in a stable 
> configuration when doing measurements.
> 
> One issue with this patch on Gen11 is that it will lock the system in a 
> configuration that isn't workable for media workloads (all subslices 
> enabled).
> So I think we should set the value for the locked configuration per 
> generation (gen8-10: all slices/subslices, gen11: only subslices that 
> contain VME samplers) so that we always get a functional configurations 
> for all workloads.
> Could we want to select that configuration when opening perf?

This would be via i915_perf.c/gen8_configure_all_contexts?

Sounds like an unfortunate but workable compromise. As long as there 
doesn't appear another asymmetric slice feature in the future, which 
doesn't align with VME. Like one workloads wants slices 0 & 2, another 
wants 1 & 3, or whatever. If that happens then I don't know what we do 
apart from locking out perf/OA.

> Another question is how do we expose the selected value to the user. But 
> that can be solved in a different series.

SSEU get_param would cut it? Although it would be perhaps be unexpected 
to get different results depending on whether perf/OA is active or not.. 
Hm... export device and available bitmasks via get param? Device bitmask 
would be fixed and available would change depending on whether perf/OA 
is active or not.

Regards,

Tvrtko

> Cheers,
> 
> -
> Lionel
> 
> On 14/08/18 15:40, Tvrtko Ursulin wrote:
>> From: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>>
>> If some of the contexts submitting workloads to the GPU have been
>> configured to shutdown slices/subslices, we might loose the NOA
>> configurations written in the NOA muxes.
>>
>> One possible solution to this problem is to reprogram the NOA muxes
>> when we switch to a new context. We initially tried this in the
>> workaround batchbuffer but some concerns where raised about the cost
>> of reprogramming at every context switch. This solution is also not
>> without consequences from the userspace point of view. Reprogramming
>> of the muxes can only happen once the powergating configuration has
>> changed (which happens after context switch). This means for a window
>> of time during the recording, counters recorded by the OA unit might
>> be invalid. This requires userspace dealing with OA reports to discard
>> the invalid values.
>>
>> Minimizing the reprogramming could be implemented by tracking of the
>> last programmed configuration somewhere in GGTT and use MI_PREDICATE
>> to discard some of the programming commands, but the command streamer
>> would still have to parse all the MI_LRI instructions in the
>> workaround batchbuffer.
>>
>> Another solution, which this change implements, is to simply disregard
>> the user requested configuration for the period of time when i915/perf
>> is active. There is no known issue with this apart from a performance
>> penality for some media workloads that benefit from running on a
>> partially powergated GPU. We already prevent RC6 from affecting the
>> programming so it doesn't sound completely unreasonable to hold on
>> powergating for the same reason.
>>
>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>>
>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>>      More to_intel_context() (Tvrtko)
>>      s/dev_priv/i915/ (Tvrtko)
>>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h  | 15 +++++++++++++++
>>   drivers/gpu/drm/i915/i915_perf.c | 23 ++++++++++++++++++-----
>>   drivers/gpu/drm/i915/intel_lrc.c | 11 +++++++----
>>   drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>>   4 files changed, 43 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index d6049c3f911b..5c12d2676435 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -3851,4 +3851,19 @@ static inline int 
>> intel_hws_csb_write_index(struct drm_i915_private *i915)
>>           return I915_HWS_CSB_WRITE_INDEX;
>>   }
>> +static inline struct intel_sseu
>> +intel_engine_prepare_sseu(struct intel_engine_cs *engine,
>> +              struct intel_sseu sseu)
>> +{
>> +    struct drm_i915_private *i915 = engine->i915;
>> +
>> +    /*
>> +     * If i915/perf is active, we want a stable powergating 
>> configuration
>> +     * on the system. The most natural configuration to take in that 
>> case
>> +     * is the default (i.e maximum the hardware can do).
>> +     */
>> +    return i915->perf.oa.exclusive_stream ?
>> +        intel_device_default_sseu(i915) : sseu;
>> +}
>> +
>>   #endif
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>> b/drivers/gpu/drm/i915/i915_perf.c
>> index ccb20230df2c..c2fc2399e0ed 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -1631,7 +1631,8 @@ static void hsw_disable_metric_set(struct 
>> drm_i915_private *dev_priv)
>>    */
>>   static void gen8_update_reg_state_unlocked(struct i915_gem_context 
>> *ctx,
>>                          u32 *reg_state,
>> -                       const struct i915_oa_config *oa_config)
>> +                       const struct i915_oa_config *oa_config,
>> +                       struct intel_sseu sseu)
>>   {
>>       struct drm_i915_private *dev_priv = ctx->i915;
>>       u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
>> @@ -1677,6 +1678,9 @@ static void 
>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>           CTX_REG(reg_state, state_offset, flex_regs[i], value);
>>       }
>> +
>> +    CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>> +        gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu));
>>   }
>>   /*
>> @@ -1807,6 +1811,7 @@ static int 
>> gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_pr
>>   static int gen8_configure_all_contexts(struct drm_i915_private 
>> *dev_priv,
>>                          const struct i915_oa_config *oa_config)
>>   {
>> +    struct intel_sseu default_sseu = 
>> intel_device_default_sseu(dev_priv);
>>       struct intel_engine_cs *engine = dev_priv->engine[RCS];
>>       struct i915_gem_context *ctx;
>>       int ret;
>> @@ -1854,7 +1859,8 @@ static int gen8_configure_all_contexts(struct 
>> drm_i915_private *dev_priv,
>>           ce->state->obj->mm.dirty = true;
>>           regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
>> -        gen8_update_reg_state_unlocked(ctx, regs, oa_config);
>> +        gen8_update_reg_state_unlocked(ctx, regs, oa_config,
>> +                           oa_config ? default_sseu : ce->sseu);
>>           i915_gem_object_unpin_map(ce->state->obj);
>>       }
>> @@ -2226,14 +2232,21 @@ void i915_oa_init_reg_state(struct 
>> intel_engine_cs *engine,
>>                   struct i915_gem_context *ctx,
>>                   u32 *reg_state)
>>   {
>> +    struct drm_i915_private *i915 = engine->i915;
>>       struct i915_perf_stream *stream;
>>       if (engine->id != RCS)
>>           return;
>> -    stream = engine->i915->perf.oa.exclusive_stream;
>> -    if (stream)
>> -        gen8_update_reg_state_unlocked(ctx, reg_state, 
>> stream->oa_config);
>> +    stream = i915->perf.oa.exclusive_stream;
>> +    if (stream) {
>> +        struct intel_sseu default_sseu =
>> +            intel_device_default_sseu(i915);
>> +
>> +        gen8_update_reg_state_unlocked(ctx, reg_state,
>> +                           stream->oa_config,
>> +                           default_sseu);
>> +    }
>>   }
>>   /**
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index 7b2f2d6bb057..8a2997be7ef7 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -2484,8 +2484,8 @@ int logical_xcs_ring_init(struct intel_engine_cs 
>> *engine)
>>       return logical_ring_init(engine);
>>   }
>> -static u32 make_rpcs(const struct sseu_dev_info *sseu,
>> -             struct intel_sseu ctx_sseu)
>> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
>> +           struct intel_sseu ctx_sseu)
>>   {
>>       u32 rpcs = 0;
>> @@ -2635,10 +2635,13 @@ static void execlists_init_reg_state(u32 *regs,
>>       }
>>       if (rcs) {
>> +        struct intel_sseu sseu = to_intel_context(ctx, engine)->sseu;
>> +
>>           regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>>           CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>> -            make_rpcs(&INTEL_INFO(dev_priv)->sseu,
>> -                  to_intel_context(ctx, engine)->sseu));
>> +            gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu,
>> +                       intel_engine_prepare_sseu(engine,
>> +                                 sseu)));
>>           i915_oa_init_reg_state(engine, ctx, regs);
>>       }
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.h 
>> b/drivers/gpu/drm/i915/intel_lrc.h
>> index f5a5502ecf70..bf3acdc3d0af 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.h
>> +++ b/drivers/gpu/drm/i915/intel_lrc.h
>> @@ -104,4 +104,7 @@ void intel_lr_context_resume(struct 
>> drm_i915_private *dev_priv);
>>   void intel_execlists_set_default_submission(struct intel_engine_cs 
>> *engine);
>> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
>> +           struct intel_sseu ctx_sseu);
>> +
>>   #endif /* _INTEL_LRC_H_ */
> 
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx


More information about the Intel-gfx mailing list