[Intel-gfx] [PATCH 5/8] drm/i915/perf: lock powergating configuration to default when active
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Aug 15 11:58:05 UTC 2018
On 14/08/2018 15:59, Lionel Landwerlin wrote:
> Hey Tvrtko,
>
> Thanks for taking over this series.
>
> I've been talking to developers using the i915/perf interface and from
> their point of view, they expect the system to be in a stable
> configuration when doing measurements.
>
> One issue with this patch on Gen11 is that it will lock the system in a
> configuration that isn't workable for media workloads (all subslices
> enabled).
> So I think we should set the value for the locked configuration per
> generation (gen8-10: all slices/subslices, gen11: only subslices that
> contain VME samplers) so that we always get a functional configurations
> for all workloads.
> Could we want to select that configuration when opening perf?
This would be via i915_perf.c/gen8_configure_all_contexts?
Sounds like an unfortunate but workable compromise. As long as there
doesn't appear another asymmetric slice feature in the future, which
doesn't align with VME. Like one workloads wants slices 0 & 2, another
wants 1 & 3, or whatever. If that happens then I don't know what we do
apart from locking out perf/OA.
> Another question is how do we expose the selected value to the user. But
> that can be solved in a different series.
SSEU get_param would cut it? Although it would be perhaps be unexpected
to get different results depending on whether perf/OA is active or not..
Hm... export device and available bitmasks via get param? Device bitmask
would be fixed and available would change depending on whether perf/OA
is active or not.
Regards,
Tvrtko
> Cheers,
>
> -
> Lionel
>
> On 14/08/18 15:40, Tvrtko Ursulin wrote:
>> From: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>>
>> If some of the contexts submitting workloads to the GPU have been
>> configured to shutdown slices/subslices, we might loose the NOA
>> configurations written in the NOA muxes.
>>
>> One possible solution to this problem is to reprogram the NOA muxes
>> when we switch to a new context. We initially tried this in the
>> workaround batchbuffer but some concerns where raised about the cost
>> of reprogramming at every context switch. This solution is also not
>> without consequences from the userspace point of view. Reprogramming
>> of the muxes can only happen once the powergating configuration has
>> changed (which happens after context switch). This means for a window
>> of time during the recording, counters recorded by the OA unit might
>> be invalid. This requires userspace dealing with OA reports to discard
>> the invalid values.
>>
>> Minimizing the reprogramming could be implemented by tracking of the
>> last programmed configuration somewhere in GGTT and use MI_PREDICATE
>> to discard some of the programming commands, but the command streamer
>> would still have to parse all the MI_LRI instructions in the
>> workaround batchbuffer.
>>
>> Another solution, which this change implements, is to simply disregard
>> the user requested configuration for the period of time when i915/perf
>> is active. There is no known issue with this apart from a performance
>> penality for some media workloads that benefit from running on a
>> partially powergated GPU. We already prevent RC6 from affecting the
>> programming so it doesn't sound completely unreasonable to hold on
>> powergating for the same reason.
>>
>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>>
>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>> More to_intel_context() (Tvrtko)
>> s/dev_priv/i915/ (Tvrtko)
>>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_drv.h | 15 +++++++++++++++
>> drivers/gpu/drm/i915/i915_perf.c | 23 ++++++++++++++++++-----
>> drivers/gpu/drm/i915/intel_lrc.c | 11 +++++++----
>> drivers/gpu/drm/i915/intel_lrc.h | 3 +++
>> 4 files changed, 43 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index d6049c3f911b..5c12d2676435 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -3851,4 +3851,19 @@ static inline int
>> intel_hws_csb_write_index(struct drm_i915_private *i915)
>> return I915_HWS_CSB_WRITE_INDEX;
>> }
>> +static inline struct intel_sseu
>> +intel_engine_prepare_sseu(struct intel_engine_cs *engine,
>> + struct intel_sseu sseu)
>> +{
>> + struct drm_i915_private *i915 = engine->i915;
>> +
>> + /*
>> + * If i915/perf is active, we want a stable powergating
>> configuration
>> + * on the system. The most natural configuration to take in that
>> case
>> + * is the default (i.e maximum the hardware can do).
>> + */
>> + return i915->perf.oa.exclusive_stream ?
>> + intel_device_default_sseu(i915) : sseu;
>> +}
>> +
>> #endif
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c
>> b/drivers/gpu/drm/i915/i915_perf.c
>> index ccb20230df2c..c2fc2399e0ed 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -1631,7 +1631,8 @@ static void hsw_disable_metric_set(struct
>> drm_i915_private *dev_priv)
>> */
>> static void gen8_update_reg_state_unlocked(struct i915_gem_context
>> *ctx,
>> u32 *reg_state,
>> - const struct i915_oa_config *oa_config)
>> + const struct i915_oa_config *oa_config,
>> + struct intel_sseu sseu)
>> {
>> struct drm_i915_private *dev_priv = ctx->i915;
>> u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
>> @@ -1677,6 +1678,9 @@ static void
>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>> CTX_REG(reg_state, state_offset, flex_regs[i], value);
>> }
>> +
>> + CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>> + gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu));
>> }
>> /*
>> @@ -1807,6 +1811,7 @@ static int
>> gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_pr
>> static int gen8_configure_all_contexts(struct drm_i915_private
>> *dev_priv,
>> const struct i915_oa_config *oa_config)
>> {
>> + struct intel_sseu default_sseu =
>> intel_device_default_sseu(dev_priv);
>> struct intel_engine_cs *engine = dev_priv->engine[RCS];
>> struct i915_gem_context *ctx;
>> int ret;
>> @@ -1854,7 +1859,8 @@ static int gen8_configure_all_contexts(struct
>> drm_i915_private *dev_priv,
>> ce->state->obj->mm.dirty = true;
>> regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
>> - gen8_update_reg_state_unlocked(ctx, regs, oa_config);
>> + gen8_update_reg_state_unlocked(ctx, regs, oa_config,
>> + oa_config ? default_sseu : ce->sseu);
>> i915_gem_object_unpin_map(ce->state->obj);
>> }
>> @@ -2226,14 +2232,21 @@ void i915_oa_init_reg_state(struct
>> intel_engine_cs *engine,
>> struct i915_gem_context *ctx,
>> u32 *reg_state)
>> {
>> + struct drm_i915_private *i915 = engine->i915;
>> struct i915_perf_stream *stream;
>> if (engine->id != RCS)
>> return;
>> - stream = engine->i915->perf.oa.exclusive_stream;
>> - if (stream)
>> - gen8_update_reg_state_unlocked(ctx, reg_state,
>> stream->oa_config);
>> + stream = i915->perf.oa.exclusive_stream;
>> + if (stream) {
>> + struct intel_sseu default_sseu =
>> + intel_device_default_sseu(i915);
>> +
>> + gen8_update_reg_state_unlocked(ctx, reg_state,
>> + stream->oa_config,
>> + default_sseu);
>> + }
>> }
>> /**
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index 7b2f2d6bb057..8a2997be7ef7 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -2484,8 +2484,8 @@ int logical_xcs_ring_init(struct intel_engine_cs
>> *engine)
>> return logical_ring_init(engine);
>> }
>> -static u32 make_rpcs(const struct sseu_dev_info *sseu,
>> - struct intel_sseu ctx_sseu)
>> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
>> + struct intel_sseu ctx_sseu)
>> {
>> u32 rpcs = 0;
>> @@ -2635,10 +2635,13 @@ static void execlists_init_reg_state(u32 *regs,
>> }
>> if (rcs) {
>> + struct intel_sseu sseu = to_intel_context(ctx, engine)->sseu;
>> +
>> regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>> CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>> - make_rpcs(&INTEL_INFO(dev_priv)->sseu,
>> - to_intel_context(ctx, engine)->sseu));
>> + gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu,
>> + intel_engine_prepare_sseu(engine,
>> + sseu)));
>> i915_oa_init_reg_state(engine, ctx, regs);
>> }
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.h
>> b/drivers/gpu/drm/i915/intel_lrc.h
>> index f5a5502ecf70..bf3acdc3d0af 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.h
>> +++ b/drivers/gpu/drm/i915/intel_lrc.h
>> @@ -104,4 +104,7 @@ void intel_lr_context_resume(struct
>> drm_i915_private *dev_priv);
>> void intel_execlists_set_default_submission(struct intel_engine_cs
>> *engine);
>> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
>> + struct intel_sseu ctx_sseu);
>> +
>> #endif /* _INTEL_LRC_H_ */
>
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
More information about the Intel-gfx
mailing list