[Intel-gfx] [PATCH 1/2] drm/i915: Engine relative MMIO
Rodrigo Vivi
rodrigo.vivi at intel.com
Tue Sep 24 22:28:58 UTC 2019
On Tue, Sep 24, 2019 at 09:45:02AM +0100, Tvrtko Ursulin wrote:
>
> On 24/09/2019 00:51, John.C.Harrison at Intel.com wrote:
> > From: John Harrison <John.C.Harrison at Intel.com>
> >
> > With virtual engines, it is no longer possible to know which specific
> > physical engine a given request will be executed on at the time that
> > request is generated. This means that the request itself must be engine
> > agnostic - any direct register writes must be relative to the engine
> > and not absolute addresses.
> >
> > The LRI command has support for engine relative addressing. However,
> > the mechanism is not transparent to the driver. The scheme for Gen11
> > (MI_LRI_ADD_CS_MMIO_START) requires the LRI address to have no
> > absolute engine base component. The hardware then adds on the correct
> > engine offset at execution time.
> >
> > Due to the non-trivial and differing schemes on different hardware, it
> > is not possible to simply update the code that creates the LRI
> > commands to set a remap flag and let the hardware get on with it.
> > Instead, this patch adds function wrappers for generating the LRI
> > command itself and then for constructing the correct address to use
> > with the LRI.
> >
> > v2: Fix build break in GVT. Remove flags parameter [review feedback
> > from Chris W].
> >
> > v3: Fix build break in selftest. Rebase to newer base tree and fix
> > merge conflict.
> >
> > v4: More rebasing. Rmoved relative addressing support from Gen7-9 only
> > code paths [review feedback from Chris W].
> >
> > v5: More rebasing (new 'gt' directory). Fix white space issue. Use
> > COPY class rather than BCS0 id for checking against BCS engine.
> >
> > v6: Change to MI_LRI for generic code and __MI_LRI for legacy code.
> > MI_LRI is now a macro wrapper around an engine function pointer to
> > reduce runtime checks on LRI relative support. [review feedback from
> > Tvrtko]
> >
> > v7: Replace engine function pointer with a per engine flags field that
> > is added by a common function. [Daniele]
> >
> > v8: Re-write on top of patch series by Mika.
> >
> > v9: Fix workaround IGT failure. Not sure why but my local test runs
> > were passing by claiming no workarounds whereas the pre-commit check
> > failed. As all the workaround registers are currently engine
> > independent, it is safe to simply not add the engine relative flag
> > (which is what the selftest was expecting).
> >
> > Bspec: 45606
> > Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> > CC: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > CC: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > CC: Chris P Wilson <chris.p.wilson at intel.com>
> > CC: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> > ---
> > drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 ++--
> > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 24 ++++++++++++
> > drivers/gpu/drm/i915/gt/intel_engine_types.h | 3 ++
> > drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 7 +++-
> > drivers/gpu/drm/i915/gt/intel_lrc.c | 40 ++++++++++----------
> > drivers/gpu/drm/i915/gt/intel_mocs.c | 6 +--
> > drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 12 +++---
> > drivers/gpu/drm/i915/i915_perf.c | 8 +++-
> > 8 files changed, 73 insertions(+), 34 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index 4a34c4f62065..c8d7075a481d 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -980,7 +980,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
> > {
> > struct i915_address_space *vm = rq->hw_context->vm;
> > struct intel_engine_cs *engine = rq->engine;
> > - u32 base = engine->mmio_base;
> > + u32 base = engine->lri_mmio_base;
> > u32 *cs;
> > int i;
> > @@ -992,7 +992,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
> > if (IS_ERR(cs))
> > return PTR_ERR(cs);
> > - *cs++ = MI_LOAD_REGISTER_IMM(2);
> > + *cs++ = MI_LOAD_REGISTER_IMM(2) | engine->lri_cmd_mode;
>
> I missed the part where we ended up with this new flavour and I can't say I
> am a fan.
>
> What's the point in having to remember to or-in the flags at so many call
> sites? Are you now touching all mi_lri, or most, or how many? If all or most
> then why not MI_LOAD_REGISTER_IMM(engine, 2)? If not most then the same with
> _REL suffix as discussed before and leave the legacy untouched.
Yeap, I also believe that it is better to not touch legacy and create the
new one only with new cases. The real change gets much clear.
>
> Don't know really on all the details since I lost track, but the current
> option leaves me concerned.
>
> Regards,
>
> Tvrtko
>
> > *cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, 0));
> > *cs++ = upper_32_bits(pd_daddr);
> > @@ -1014,7 +1014,8 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
> > if (IS_ERR(cs))
> > return PTR_ERR(cs);
> > - *cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) | MI_LRI_FORCE_POSTED;
> > + *cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) |
> > + MI_LRI_FORCE_POSTED | engine->lri_cmd_mode;
> > for (i = GEN8_3LVL_PDPES; i--; ) {
> > const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index f451d5076bde..5aa58e069806 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -236,6 +236,28 @@ static u32 __engine_mmio_base(struct drm_i915_private *i915,
> > return bases[i].base;
> > }
> > +static bool i915_engine_has_relative_lri(const struct intel_engine_cs *engine)
> > +{
> > + if (INTEL_GEN(engine->i915) < 11)
> > + return false;
> > +
> > + if (engine->class == COPY_ENGINE_CLASS)
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > +static void lri_init(struct intel_engine_cs *engine)
> > +{
> > + if (i915_engine_has_relative_lri(engine)) {
> > + engine->lri_cmd_mode = MI_LRI_ADD_CS_MMIO_START_GEN11;
> > + engine->lri_mmio_base = 0;
> > + } else {
> > + engine->lri_cmd_mode = 0;
> > + engine->lri_mmio_base = engine->mmio_base;
> > + }
> > +}
> > +
> > static void __sprint_engine_name(struct intel_engine_cs *engine)
> > {
> > /*
> > @@ -320,6 +342,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
> > /* Nothing to do here, execute in order of dependencies */
> > engine->schedule = NULL;
> > + lri_init(engine);
> > +
> > seqlock_init(&engine->stats.lock);
> > ATOMIC_INIT_NOTIFIER_HEAD(&engine->context_status_notifier);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > index 943f0663837e..38f486f7f7e3 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > @@ -306,6 +306,9 @@ struct intel_engine_cs {
> > u32 context_size;
> > u32 mmio_base;
> > + u32 lri_mmio_base;
> > + u32 lri_cmd_mode;
> > +
> > u32 uabi_capabilities;
> > struct rb_node uabi_node;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > index f78b13d74e17..bfb51d8d7ce2 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > @@ -133,11 +133,16 @@
> > * simply ignores the register load under certain conditions.
> > * - One can actually load arbitrary many arbitrary registers: Simply issue x
> > * address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
> > + * - Newer hardware supports engine relative addressing but older hardware does
> > + * not. This is required for hw engine load balancing. Hence the MI_LRI
> > + * instruction itself is prefixed with '__' and should only be used on
> > + * legacy hardware code paths. Generic code must always use the MI_LRI
> > + * and i915_get_lri_reg() helper functions instead.
> > */
> > #define MI_LOAD_REGISTER_IMM(x) MI_INSTR(0x22, 2*(x)-1)
> > /* Gen11+. addr = base + (ctx_restore ? offset & GENMASK(12,2) : offset) */
> > -#define MI_LRI_CS_MMIO (1<<19)
> > #define MI_LRI_FORCE_POSTED (1<<12)
> > +#define MI_LRI_ADD_CS_MMIO_START_GEN11 BIT(19)
> > #define MI_STORE_REGISTER_MEM MI_INSTR(0x24, 1)
> > #define MI_STORE_REGISTER_MEM_GEN8 MI_INSTR(0x24, 2)
> > #define MI_SRM_LRM_GLOBAL_GTT (1<<22)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 6cfdc0f9f2b9..7cd59d29c4e4 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -2010,11 +2010,12 @@ struct lri {
> > u32 value;
> > };
> > -static u32 *emit_lri(u32 *batch, const struct lri *lri, unsigned int count)
> > +static u32 *emit_lri(struct intel_engine_cs *engine, u32 *batch,
> > + const struct lri *lri, unsigned int count)
> > {
> > GEM_BUG_ON(!count || count > 63);
> > - *batch++ = MI_LOAD_REGISTER_IMM(count);
> > + *batch++ = MI_LOAD_REGISTER_IMM(count) | engine->lri_cmd_mode;
> > do {
> > *batch++ = i915_mmio_reg_offset(lri->reg);
> > *batch++ = lri->value;
> > @@ -2054,7 +2055,7 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
> > /* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */
> > batch = gen8_emit_flush_coherentl3_wa(engine, batch);
> > - batch = emit_lri(batch, lri, ARRAY_SIZE(lri));
> > + batch = emit_lri(engine, batch, lri, ARRAY_SIZE(lri));
> > /* WaMediaPoolStateCmdInWABB:bxt,glk */
> > if (HAS_POOLED_EU(engine->i915)) {
> > @@ -3282,7 +3283,7 @@ static void init_common_reg_state(u32 * const regs,
> > struct intel_engine_cs *engine,
> > struct intel_ring *ring)
> > {
> > - const u32 base = engine->mmio_base;
> > + const u32 base = engine->lri_mmio_base;
> > CTX_REG(regs, CTX_CONTEXT_CONTROL, RING_CONTEXT_CONTROL(base),
> > _MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT) |
> > @@ -3307,7 +3308,7 @@ static void init_wa_bb_reg_state(u32 * const regs,
> > u32 pos_bb_per_ctx)
> > {
> > struct i915_ctx_workarounds * const wa_ctx = &engine->wa_ctx;
> > - const u32 base = engine->mmio_base;
> > + const u32 base = engine->lri_mmio_base;
> > const u32 pos_indirect_ctx = pos_bb_per_ctx + 2;
> > const u32 pos_indirect_ctx_offset = pos_indirect_ctx + 2;
> > @@ -3335,6 +3336,7 @@ static void init_wa_bb_reg_state(u32 * const regs,
> > }
> > static void init_ppgtt_reg_state(u32 *regs, u32 base,
> > + struct intel_engine_cs *engine,
> > struct i915_ppgtt *ppgtt)
> > {
> > /* PDP values well be assigned later if needed */
> > @@ -3376,14 +3378,12 @@ static void gen8_init_reg_state(u32 * const regs,
> > {
> > struct i915_ppgtt * const ppgtt = vm_alias(ce->vm);
> > const bool rcs = engine->class == RENDER_CLASS;
> > - const u32 base = engine->mmio_base;
> > - const u32 lri_base =
> > - intel_engine_has_relative_mmio(engine) ? MI_LRI_CS_MMIO : 0;
> > + const u32 base = engine->lri_mmio_base;
> > regs[CTX_LRI_HEADER_0] =
> > MI_LOAD_REGISTER_IMM(rcs ? 14 : 11) |
> > MI_LRI_FORCE_POSTED |
> > - lri_base;
> > + engine->lri_cmd_mode;
> > init_common_reg_state(regs, ppgtt, engine, ring);
> > CTX_REG(regs, CTX_SECOND_BB_HEAD_U, RING_SBBADDR_UDW(base), 0);
> > @@ -3395,15 +3395,17 @@ static void gen8_init_reg_state(u32 * const regs,
> > regs[CTX_LRI_HEADER_1] =
> > MI_LOAD_REGISTER_IMM(9) |
> > MI_LRI_FORCE_POSTED |
> > - lri_base;
> > + engine->lri_cmd_mode;
> > CTX_REG(regs, CTX_CTX_TIMESTAMP, RING_CTX_TIMESTAMP(base), 0);
> > - init_ppgtt_reg_state(regs, base, ppgtt);
> > + init_ppgtt_reg_state(regs, base, engine, ppgtt);
> > if (rcs) {
> > - regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1) | lri_base;
> > - CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
> > + regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1) |
> > + engine->lri_cmd_mode;
> > + CTX_REG(regs, CTX_R_PWR_CLK_STATE,
> > + GEN8_R_PWR_CLK_STATE, 0);
> > }
> > regs[CTX_END] = MI_BATCH_BUFFER_END;
> > @@ -3418,14 +3420,12 @@ static void gen12_init_reg_state(u32 * const regs,
> > {
> > struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(ce->vm);
> > const bool rcs = engine->class == RENDER_CLASS;
> > - const u32 base = engine->mmio_base;
> > - const u32 lri_base =
> > - intel_engine_has_relative_mmio(engine) ? MI_LRI_CS_MMIO : 0;
> > + const u32 base = engine->lri_mmio_base;
> > regs[CTX_LRI_HEADER_0] =
> > MI_LOAD_REGISTER_IMM(rcs ? 11 : 9) |
> > MI_LRI_FORCE_POSTED |
> > - lri_base;
> > + engine->lri_cmd_mode;
> > init_common_reg_state(regs, ppgtt, engine, ring);
> > @@ -3435,15 +3435,15 @@ static void gen12_init_reg_state(u32 * const regs,
> > regs[CTX_LRI_HEADER_1] =
> > MI_LOAD_REGISTER_IMM(9) |
> > MI_LRI_FORCE_POSTED |
> > - lri_base;
> > + engine->lri_cmd_mode;
> > CTX_REG(regs, CTX_CTX_TIMESTAMP, RING_CTX_TIMESTAMP(base), 0);
> > - init_ppgtt_reg_state(regs, base, ppgtt);
> > + init_ppgtt_reg_state(regs, base, engine, ppgtt);
> > if (rcs) {
> > regs[GEN12_CTX_LRI_HEADER_3] =
> > - MI_LOAD_REGISTER_IMM(1) | lri_base;
> > + MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
> > CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
> > /* TODO: oa_init_reg_state ? */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c
> > index 728704bbbe18..8e8fe3deb95c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_mocs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
> > @@ -368,9 +368,6 @@ static u32 get_entry_control(const struct drm_i915_mocs_table *table,
> > /**
> > * intel_mocs_init_engine() - emit the mocs control table
> > * @engine: The engine for whom to emit the registers.
> > - *
> > - * This function simply emits a MI_LOAD_REGISTER_IMM command for the
> > - * given table starting at the given address.
> > */
> > void intel_mocs_init_engine(struct intel_engine_cs *engine)
> > {
> > @@ -456,7 +453,8 @@ static int emit_mocs_control_table(struct i915_request *rq,
> > if (IS_ERR(cs))
> > return PTR_ERR(cs);
> > - *cs++ = MI_LOAD_REGISTER_IMM(table->n_entries);
> > + *cs++ = MI_LOAD_REGISTER_IMM(table->n_entries) |
> > + rq->engine->lri_cmd_mode;
> > for (index = 0; index < table->size; index++) {
> > u32 value = get_entry_control(table, index);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> > index 0747b8c9f768..7027367b1f1a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> > @@ -1536,12 +1536,13 @@ static int load_pd_dir(struct i915_request *rq, const struct i915_ppgtt *ppgtt)
> > if (IS_ERR(cs))
> > return PTR_ERR(cs);
> > - *cs++ = MI_LOAD_REGISTER_IMM(1);
> > - *cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine->mmio_base));
> > + /* Can these not be merged into a single LRI??? */
> > + *cs++ = MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
> > + *cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine->lri_mmio_base));
> > *cs++ = PP_DIR_DCLV_2G;
> > - *cs++ = MI_LOAD_REGISTER_IMM(1);
> > - *cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->mmio_base));
> > + *cs++ = MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
> > + *cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->lri_mmio_base));
> > *cs++ = px_base(ppgtt->pd)->ggtt_offset << 10;
> > intel_ring_advance(rq, cs);
> > @@ -1709,7 +1710,8 @@ static int remap_l3_slice(struct i915_request *rq, int slice)
> > * here because no other code should access these registers other than
> > * at initialization time.
> > */
> > - *cs++ = MI_LOAD_REGISTER_IMM(GEN7_L3LOG_SIZE/4);
> > + *cs++ = MI_LOAD_REGISTER_IMM(GEN7_L3LOG_SIZE / 4) |
> > + rq->engine->lri_cmd_mode;
> > for (i = 0; i < GEN7_L3LOG_SIZE/4; i++) {
> > *cs++ = i915_mmio_reg_offset(GEN7_L3LOG(slice, i));
> > *cs++ = remap_info[i];
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > index c1b764233761..8b85cdfada21 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -1693,6 +1693,11 @@ gen8_update_reg_state_unlocked(struct i915_perf_stream *stream,
> > };
> > int i;
> > + /*
> > + * NB: The LRI instruction is generated by the hardware.
> > + * Should we read it in and assert that the offset flag is set?
> > + */
> > +
> > CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
> > (stream->period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
> > (stream->periodic ? GEN8_OA_TIMER_ENABLE : 0) |
> > @@ -1752,8 +1757,9 @@ gen8_load_flex(struct i915_request *rq,
> > if (IS_ERR(cs))
> > return PTR_ERR(cs);
> > - *cs++ = MI_LOAD_REGISTER_IMM(count);
> > + *cs++ = MI_LOAD_REGISTER_IMM(count) | rq->engine->lri_cmd_mode;
> > do {
> > + /* FIXME: Is this table LRI remap/offset friendly? */
> > *cs++ = i915_mmio_reg_offset(flex->reg);
> > *cs++ = flex->value;
> > } while (flex++, --count);
> >
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
More information about the Intel-gfx
mailing list