[Intel-gfx] [PATCH 1/2] drm/i915/selftests: Verify the LRC register layout between init and HW
Mika Kuoppala
mika.kuoppala at linux.intel.com
Tue Sep 24 15:07:01 UTC 2019
Chris Wilson <chris at chris-wilson.co.uk> writes:
> Quoting Mika Kuoppala (2019-09-24 11:21:38)
>> Chris Wilson <chris at chris-wilson.co.uk> writes:
>>
>> > Before we submit the first context to HW, we need to construct a valid
>> > image of the register state. This layout is defined by the HW and should
>> > match the layout generated by HW when it saves the context image.
>> > Asserting that this should be equivalent should help avoid any undefined
>> > behaviour and verify that we haven't missed anything important!
>> >
>> > Of course, having insisted that the initial register state within the
>> > LRC should match that returned by HW, we need to ensure that it does.
>> >
>> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>
>> > +static u32 *set_offsets(u32 *regs,
>> > + const u8 *data,
>> > + const struct intel_engine_cs *engine)
>> > +#define NOP(x) (BIT(7) | (x))
>> > +#define LRI(count, flags) ((flags) << 6 | (count))
>> > +#define POSTED BIT(0)
>> > +#define REG(x) (((x) >> 2) | BUILD_BUG_ON_ZERO(x >= 0x200))
>> > +#define REG16(x) \
>> > + (((x) >> 9) | BIT(7) | BUILD_BUG_ON_ZERO(x >= 0x10000)), \
>> > + (((x) >> 2) & 0x7f)
>>
>> I am still not sure if the actual saving are worth the complexity.
>>
>> > +#define END() 0
>> > +{
>> > + const u32 base = engine->mmio_base;
>> > +
>> > + while (*data) {
>> > + u8 count, flags;
>> > +
>> > + if (*data & BIT(7)) { /* skip */
>> > + regs += *data++ & ~BIT(7);
>> > + continue;
>> > + }
>> > +
>> > + count = *data & 0x3f;
>> > + flags = *data >> 6;
>> > + data++;
>> > +
>> > + *regs = MI_LOAD_REGISTER_IMM(count);
>> > + if (flags & POSTED)
>> > + *regs |= MI_LRI_FORCE_POSTED;
>> > + if (INTEL_GEN(engine->i915) >= 11)
>> > + *regs |= MI_LRI_CS_MMIO;
>> > + regs++;
>> > +
>> > + GEM_BUG_ON(!count);
>> > + do {
>> > + u32 offset = 0;
>> > + u8 v;
>> > +
>> > + do {
>> > + v = *data++;
>> > + offset <<= 7;
>> > + offset |= v & ~BIT(7);
>> > + } while (v & BIT(7));
>>
>> ...but perhaps this amount of extra can be tolerated.
>>
>> Did you check how this would play out with just REG being wide enough?
>
> When I started, I thought we could get away with only one REG16. Looking
> at the context image I think we might want a few non engine->mmio_base
> regs in there (if I read it right, some of the 0x4000 range are per
> engine). That will need a slightly different encoding as well :|
>
> No, I haven't but since you ask, I shall.
Now we know the bloat diff and the complexity addition
is tiny so I am fine with using the tighter REG/REG16
split.
>
>> > +
>> > + *regs = base + (offset << 2);
>>
>> In here reader is yearning for an asserts of not trampling
>> on wrong territory.
>
> If you have an idea for a good assert, go for it :)
>
> What range should be checked. offset < 0x1000 ?
>
I am fine at selftest trying to take the burden.
(that can be read like that I can't make up good asserts)
>> But I would guess that you want this part to be like
>> oiled lightning and test the machinery with selftest..as the
>> subject seems to promise.
>
> The importance is certainly placed on having a selftest and the
> confidence in keeping our offsets in line with the HW. The goal was to
> have a compact description for the register offsets, in terms of
> readability I think the emphasis should be on the tables
> (gen8_xcs_offsets[]).
>
>> > @@ -3092,7 +3451,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
>> > engine->flags |= I915_ENGINE_HAS_PREEMPTION;
>> > }
>> >
>> > - if (engine->class != COPY_ENGINE_CLASS && INTEL_GEN(engine->i915) >= 12)
>> > + if (engine->class != COPY_ENGINE_CLASS && INTEL_GEN(engine->i915) >= 11)
>> > engine->flags |= I915_ENGINE_HAS_RELATIVE_MMIO;
>>
>> Ok, first I thought this was unintentional. But prolly not.
>> Do you need it for the verifier to work?
>
> No, I ended up completely ignoring this flag as the HW does not
> differentiate between engines. On gen11+, it sets the LRI flag everywhere
> in the context image.
>
>> Could we still rip it out to be a first in the series.
>> Just would want to differiante possible icl hickups apart
>> from this patch.
With the relative MMIO for gen11 lifted as a separate
patch prior to this one,
Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>
> Sure.
>
>> > +static int live_lrc_layout(void *arg)
>> > +{
>> > + struct intel_gt *gt = arg;
>> > + struct intel_engine_cs *engine;
>> > + enum intel_engine_id id;
>> > + u32 *mem;
>> > + int err;
>> > +
>> > + /*
>> > + * Check the registers offsets we use to create the initial reg state
>> > + * match the layout saved by HW.
>> > + */
>> > +
>> > + mem = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> > + if (!mem)
>> > + return -ENOMEM;
>> > +
>> > + err = 0;
>> > + for_each_engine(engine, gt->i915, id) {
>> > + u32 *hw, *lrc;
>> > + int dw;
>> > +
>> > + if (!engine->default_state)
>> > + continue;
>> > +
>> > + hw = i915_gem_object_pin_map(engine->default_state,
>> > + I915_MAP_WB);
>>
>> This default state is not pristine as we have trampled
>> it with our first submission, right?
>
> It is the context image saved after the first request.
>
>> But being succeeded at doing so, the next context
>> save should overwrite our trampling and it would
>> then represent the hw accurate context save
>> state.
>>
>> Against which we will compare of our reg state
>> writer.
>
> Right, default_state is the HW version of our init_reg_state.
>
>> > + if (IS_ERR(hw)) {
>> > + err = PTR_ERR(hw);
>> > + break;
>> > + }
>> > + hw += LRC_STATE_PN * PAGE_SIZE / sizeof(*hw);
>> > +
>> > + lrc = memset(mem, 0, PAGE_SIZE);
>> > + execlists_init_reg_state(lrc,
>> > + engine->kernel_context,
>> > + engine,
>> > + engine->kernel_context->ring,
>> > + true);
>> > +
>> > + dw = 0;
>> > + do {
>> > + u32 lri = hw[dw];
>> > +
>> > + if (lri == 0) {
>> > + dw++;
>> > + continue;
>> > + }
>> > +
>> > + if ((lri & GENMASK(31, 23)) != MI_INSTR(0x22, 0)) {
>> > + pr_err("%s: Expected LRI command at dword %d, found %08x\n",
>> > + engine->name, dw, lri);
>> > + err = -EINVAL;
>> > + break;
>> > + }
>> > +
>> > + if (lrc[dw] != lri) {
>> > + pr_err("%s: LRI command mismatch at dword %d, expected %08x found %08x\n",
>> > + engine->name, dw, lri, lrc[dw]);
>> > + err = -EINVAL;
>> > + break;
>> > + }
>> > +
>> > + lri &= 0x7f;
>> > + lri++;
>> > + dw++;
>> > +
>> > + while (lri) {
>> > + if (hw[dw] != lrc[dw]) {
>> > + pr_err("%s: Different registers found at dword %d, expected %x, found %x\n",
>> > + engine->name, dw, hw[dw], lrc[dw]);
>> > + err = -EINVAL;
>> > + break;
>> > + }
>> > +
>> > + /*
>> > + * Skip over the actual register value as we
>> > + * expect that to differ.
>> > + */
>> > + dw += 2;
>> > + lri -= 2;
>>
>> This makes me wonder if we could use this machinery post hang. Just to
>> get a little more triage data out, ie 'your context looks corrupted at
>> offset %x'...
>
> Certainly possible, but what we check here is _mostly_ the privileged
> registers that are not really meant to be changed by the user -- and we
> are only checking the offsets, so unlikely there to be just one wrong.
>
> The general principle was that we should provide raw information and
> have the smarts in userspace (so that we could always enhance our
> processing and reanalyse existing dumps). But at the end of the day,
> whatever allows us to prevent bugs or fix bugs is paramount.
>
> But I'm not yet sold this helps. Maybe if we find an example where it
> proves useful...
>
>> > + }
>> > + } while ((lrc[dw] & ~BIT(0)) != MI_BATCH_BUFFER_END);
>>
>> Ok, you tie up always the generate image. For future work add the hw batch
>> endpoint be a part of checker?
>
> It's not always in the first page, I'm not even sure if a BB_END is
> always included in the older gen. (I have a feeling the HW definitely
> started including it ~gen10.)
> -Chris
More information about the Intel-gfx
mailing list