[Intel-gfx] [PATCH 1/2] drm/i915/selftests: Verify the LRC register layout between init and HW
Chris Wilson
chris at chris-wilson.co.uk
Tue Sep 24 10:43:57 UTC 2019
Quoting Mika Kuoppala (2019-09-24 11:21:38)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
>
> > Before we submit the first context to HW, we need to construct a valid
> > image of the register state. This layout is defined by the HW and should
> > match the layout generated by HW when it saves the context image.
> > Asserting that this should be equivalent should help avoid any undefined
> > behaviour and verify that we haven't missed anything important!
> >
> > Of course, having insisted that the initial register state within the
> > LRC should match that returned by HW, we need to ensure that it does.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> > +static u32 *set_offsets(u32 *regs,
> > + const u8 *data,
> > + const struct intel_engine_cs *engine)
> > +#define NOP(x) (BIT(7) | (x))
> > +#define LRI(count, flags) ((flags) << 6 | (count))
> > +#define POSTED BIT(0)
> > +#define REG(x) (((x) >> 2) | BUILD_BUG_ON_ZERO(x >= 0x200))
> > +#define REG16(x) \
> > + (((x) >> 9) | BIT(7) | BUILD_BUG_ON_ZERO(x >= 0x10000)), \
> > + (((x) >> 2) & 0x7f)
>
> I am still not sure if the actual saving are worth the complexity.
>
> > +#define END() 0
> > +{
> > + const u32 base = engine->mmio_base;
> > +
> > + while (*data) {
> > + u8 count, flags;
> > +
> > + if (*data & BIT(7)) { /* skip */
> > + regs += *data++ & ~BIT(7);
> > + continue;
> > + }
> > +
> > + count = *data & 0x3f;
> > + flags = *data >> 6;
> > + data++;
> > +
> > + *regs = MI_LOAD_REGISTER_IMM(count);
> > + if (flags & POSTED)
> > + *regs |= MI_LRI_FORCE_POSTED;
> > + if (INTEL_GEN(engine->i915) >= 11)
> > + *regs |= MI_LRI_CS_MMIO;
> > + regs++;
> > +
> > + GEM_BUG_ON(!count);
> > + do {
> > + u32 offset = 0;
> > + u8 v;
> > +
> > + do {
> > + v = *data++;
> > + offset <<= 7;
> > + offset |= v & ~BIT(7);
> > + } while (v & BIT(7));
>
> ...but perhaps this amount of extra can be tolerated.
>
> Did you check how this would play out with just REG being wide enough?
When I started, I thought we could get away with only one REG16. Looking
at the context image I think we might want a few non engine->mmio_base
regs in there (if I read it right, some of the 0x4000 range are per
engine). That will need a slightly different encoding as well :|
No, I haven't but since you ask, I shall.
> > +
> > + *regs = base + (offset << 2);
>
> In here reader is yearning for an asserts of not trampling
> on wrong territory.
If you have an idea for a good assert, go for it :)
What range should be checked. offset < 0x1000 ?
> But I would guess that you want this part to be like
> oiled lightning and test the machinery with selftest..as the
> subject seems to promise.
The importance is certainly placed on having a selftest and the
confidence in keeping our offsets in line with the HW. The goal was to
have a compact description for the register offsets, in terms of
readability I think the emphasis should be on the tables
(gen8_xcs_offsets[]).
> > @@ -3092,7 +3451,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
> > engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> > }
> >
> > - if (engine->class != COPY_ENGINE_CLASS && INTEL_GEN(engine->i915) >= 12)
> > + if (engine->class != COPY_ENGINE_CLASS && INTEL_GEN(engine->i915) >= 11)
> > engine->flags |= I915_ENGINE_HAS_RELATIVE_MMIO;
>
> Ok, first I thought this was unintentional. But prolly not.
> Do you need it for the verifier to work?
No, I ended up completely ignoring this flag as the HW does not
differentiate between engines. On gen11+, it sets the LRI flag everywhere
in the context image.
> Could we still rip it out to be a first in the series.
> Just would want to differiante possible icl hickups apart
> from this patch.
Sure.
> > +static int live_lrc_layout(void *arg)
> > +{
> > + struct intel_gt *gt = arg;
> > + struct intel_engine_cs *engine;
> > + enum intel_engine_id id;
> > + u32 *mem;
> > + int err;
> > +
> > + /*
> > + * Check the registers offsets we use to create the initial reg state
> > + * match the layout saved by HW.
> > + */
> > +
> > + mem = kmalloc(PAGE_SIZE, GFP_KERNEL);
> > + if (!mem)
> > + return -ENOMEM;
> > +
> > + err = 0;
> > + for_each_engine(engine, gt->i915, id) {
> > + u32 *hw, *lrc;
> > + int dw;
> > +
> > + if (!engine->default_state)
> > + continue;
> > +
> > + hw = i915_gem_object_pin_map(engine->default_state,
> > + I915_MAP_WB);
>
> This default state is not pristine as we have trampled
> it with our first submission, right?
It is the context image saved after the first request.
> But being succeeded at doing so, the next context
> save should overwrite our trampling and it would
> then represent the hw accurate context save
> state.
>
> Against which we will compare of our reg state
> writer.
Right, default_state is the HW version of our init_reg_state.
> > + if (IS_ERR(hw)) {
> > + err = PTR_ERR(hw);
> > + break;
> > + }
> > + hw += LRC_STATE_PN * PAGE_SIZE / sizeof(*hw);
> > +
> > + lrc = memset(mem, 0, PAGE_SIZE);
> > + execlists_init_reg_state(lrc,
> > + engine->kernel_context,
> > + engine,
> > + engine->kernel_context->ring,
> > + true);
> > +
> > + dw = 0;
> > + do {
> > + u32 lri = hw[dw];
> > +
> > + if (lri == 0) {
> > + dw++;
> > + continue;
> > + }
> > +
> > + if ((lri & GENMASK(31, 23)) != MI_INSTR(0x22, 0)) {
> > + pr_err("%s: Expected LRI command at dword %d, found %08x\n",
> > + engine->name, dw, lri);
> > + err = -EINVAL;
> > + break;
> > + }
> > +
> > + if (lrc[dw] != lri) {
> > + pr_err("%s: LRI command mismatch at dword %d, expected %08x found %08x\n",
> > + engine->name, dw, lri, lrc[dw]);
> > + err = -EINVAL;
> > + break;
> > + }
> > +
> > + lri &= 0x7f;
> > + lri++;
> > + dw++;
> > +
> > + while (lri) {
> > + if (hw[dw] != lrc[dw]) {
> > + pr_err("%s: Different registers found at dword %d, expected %x, found %x\n",
> > + engine->name, dw, hw[dw], lrc[dw]);
> > + err = -EINVAL;
> > + break;
> > + }
> > +
> > + /*
> > + * Skip over the actual register value as we
> > + * expect that to differ.
> > + */
> > + dw += 2;
> > + lri -= 2;
>
> This makes me wonder if we could use this machinery post hang. Just to
> get a little more triage data out, ie 'your context looks corrupted at
> offset %x'...
Certainly possible, but what we check here is _mostly_ the privileged
registers that are not really meant to be changed by the user -- and we
are only checking the offsets, so unlikely there to be just one wrong.
The general principle was that we should provide raw information and
have the smarts in userspace (so that we could always enhance our
processing and reanalyse existing dumps). But at the end of the day,
whatever allows us to prevent bugs or fix bugs is paramount.
But I'm not yet sold this helps. Maybe if we find an example where it
proves useful...
> > + }
> > + } while ((lrc[dw] & ~BIT(0)) != MI_BATCH_BUFFER_END);
>
> Ok, you tie up always the generate image. For future work add the hw batch
> endpoint be a part of checker?
It's not always in the first page, I'm not even sure if a BB_END is
always included in the older gen. (I have a feeling the HW definitely
started including it ~gen10.)
-Chris
More information about the Intel-gfx
mailing list