[Intel-gfx] [PATCH 1/2] drm/i915/selftests: Verify the LRC register layout between init and HW

Mika Kuoppala mika.kuoppala at linux.intel.com
Tue Sep 24 15:07:01 UTC 2019


Chris Wilson <chris at chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2019-09-24 11:21:38)
>> Chris Wilson <chris at chris-wilson.co.uk> writes:
>> 
>> > Before we submit the first context to HW, we need to construct a valid
>> > image of the register state. This layout is defined by the HW and should
>> > match the layout generated by HW when it saves the context image.
>> > Asserting that this should be equivalent should help avoid any undefined
>> > behaviour and verify that we haven't missed anything important!
>> >
>> > Of course, having insisted that the initial register state within the
>> > LRC should match that returned by HW, we need to ensure that it does.
>> >
>> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>
>> > +static u32 *set_offsets(u32 *regs,
>> > +                     const u8 *data,
>> > +                     const struct intel_engine_cs *engine)
>> > +#define NOP(x) (BIT(7) | (x))
>> > +#define LRI(count, flags) ((flags) << 6 | (count))
>> > +#define POSTED BIT(0)
>> > +#define REG(x) (((x) >> 2) | BUILD_BUG_ON_ZERO(x >= 0x200))
>> > +#define REG16(x) \
>> > +     (((x) >> 9) | BIT(7) | BUILD_BUG_ON_ZERO(x >= 0x10000)), \
>> > +     (((x) >> 2) & 0x7f)
>> 
>> I am still not sure if the actual saving are worth the complexity.
>> 
>> > +#define END() 0
>> > +{
>> > +     const u32 base = engine->mmio_base;
>> > +
>> > +     while (*data) {
>> > +             u8 count, flags;
>> > +
>> > +             if (*data & BIT(7)) { /* skip */
>> > +                     regs += *data++ & ~BIT(7);
>> > +                     continue;
>> > +             }
>> > +
>> > +             count = *data & 0x3f;
>> > +             flags = *data >> 6;
>> > +             data++;
>> > +
>> > +             *regs = MI_LOAD_REGISTER_IMM(count);
>> > +             if (flags & POSTED)
>> > +                     *regs |= MI_LRI_FORCE_POSTED;
>> > +             if (INTEL_GEN(engine->i915) >= 11)
>> > +                     *regs |= MI_LRI_CS_MMIO;
>> > +             regs++;
>> > +
>> > +             GEM_BUG_ON(!count);
>> > +             do {
>> > +                     u32 offset = 0;
>> > +                     u8 v;
>> > +
>> > +                     do {
>> > +                             v = *data++;
>> > +                             offset <<= 7;
>> > +                             offset |= v & ~BIT(7);
>> > +                     } while (v & BIT(7));
>> 
>> ...but perhaps this amount of extra can be tolerated.
>> 
>> Did you check how this would play out with just REG being wide enough?
>
> When I started, I thought we could get away with only one REG16. Looking
> at the context image I think we might want a few non engine->mmio_base
> regs in there (if I read it right, some of the 0x4000 range are per
> engine). That will need a slightly different encoding as well :|
>
> No, I haven't but since you ask, I shall.

Now we know the bloat diff and the complexity addition
is tiny so I am fine with using the tighter REG/REG16
split.

>
>> > +
>> > +                     *regs = base + (offset << 2);
>> 
>> In here reader is yearning for an asserts of not trampling
>> on wrong territory.
>
> If you have an idea for a good assert, go for it :)
>
> What range should be checked. offset < 0x1000 ?
>

I am fine at selftest trying to take the burden.
(that can be read like that I can't make up good asserts)

>> But I would guess that you want this part to be like
>> oiled lightning and test the machinery with selftest..as the
>> subject seems to promise.
>
> The importance is certainly placed on having a selftest and the
> confidence in keeping our offsets in line with the HW. The goal was to
> have a compact description for the register offsets, in terms of
> readability I think the emphasis should be on the tables
> (gen8_xcs_offsets[]).
>
>> > @@ -3092,7 +3451,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
>> >                       engine->flags |= I915_ENGINE_HAS_PREEMPTION;
>> >       }
>> >  
>> > -     if (engine->class != COPY_ENGINE_CLASS && INTEL_GEN(engine->i915) >= 12)
>> > +     if (engine->class != COPY_ENGINE_CLASS && INTEL_GEN(engine->i915) >= 11)
>> >               engine->flags |= I915_ENGINE_HAS_RELATIVE_MMIO;
>> 
>> Ok, first I thought this was unintentional. But prolly not.
>> Do you need it for the verifier to work?
>
> No, I ended up completely ignoring this flag as the HW does not
> differentiate between engines. On gen11+, it sets the LRI flag everywhere
> in the context image.
>
>> Could we still rip it out to be a first in the series.
>> Just would want to differiante possible icl hickups apart
>> from this patch.

With the relative MMIO for gen11 lifted as a separate
patch prior to this one,

Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>


>
> Sure.
>
>> > +static int live_lrc_layout(void *arg)
>> > +{
>> > +     struct intel_gt *gt = arg;
>> > +     struct intel_engine_cs *engine;
>> > +     enum intel_engine_id id;
>> > +     u32 *mem;
>> > +     int err;
>> > +
>> > +     /*
>> > +      * Check the registers offsets we use to create the initial reg state
>> > +      * match the layout saved by HW.
>> > +      */
>> > +
>> > +     mem = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> > +     if (!mem)
>> > +             return -ENOMEM;
>> > +
>> > +     err = 0;
>> > +     for_each_engine(engine, gt->i915, id) {
>> > +             u32 *hw, *lrc;
>> > +             int dw;
>> > +
>> > +             if (!engine->default_state)
>> > +                     continue;
>> > +
>> > +             hw = i915_gem_object_pin_map(engine->default_state,
>> > +                                          I915_MAP_WB);
>> 
>> This default state is not pristine as we have trampled
>> it with our first submission, right?
>
> It is the context image saved after the first request.
>  
>> But being succeeded at doing so, the next context
>> save should overwrite our trampling and it would
>> then represent the hw accurate context save
>> state.
>> 
>> Against which we will compare of our reg state
>> writer.
>
> Right, default_state is the HW version of our init_reg_state.
>
>> > +             if (IS_ERR(hw)) {
>> > +                     err = PTR_ERR(hw);
>> > +                     break;
>> > +             }
>> > +             hw += LRC_STATE_PN * PAGE_SIZE / sizeof(*hw);
>> > +
>> > +             lrc = memset(mem, 0, PAGE_SIZE);
>> > +             execlists_init_reg_state(lrc,
>> > +                                      engine->kernel_context,
>> > +                                      engine,
>> > +                                      engine->kernel_context->ring,
>> > +                                      true);
>> > +
>> > +             dw = 0;
>> > +             do {
>> > +                     u32 lri = hw[dw];
>> > +
>> > +                     if (lri == 0) {
>> > +                             dw++;
>> > +                             continue;
>> > +                     }
>> > +
>> > +                     if ((lri & GENMASK(31, 23)) != MI_INSTR(0x22, 0)) {
>> > +                             pr_err("%s: Expected LRI command at dword %d, found %08x\n",
>> > +                                    engine->name, dw, lri);
>> > +                             err = -EINVAL;
>> > +                             break;
>> > +                     }
>> > +
>> > +                     if (lrc[dw] != lri) {
>> > +                             pr_err("%s: LRI command mismatch at dword %d, expected %08x found %08x\n",
>> > +                                    engine->name, dw, lri, lrc[dw]);
>> > +                             err = -EINVAL;
>> > +                             break;
>> > +                     }
>> > +
>> > +                     lri &= 0x7f;
>> > +                     lri++;
>> > +                     dw++;
>> > +
>> > +                     while (lri) {
>> > +                             if (hw[dw] != lrc[dw]) {
>> > +                                     pr_err("%s: Different registers found at dword %d, expected %x, found %x\n",
>> > +                                            engine->name, dw, hw[dw], lrc[dw]);
>> > +                                     err = -EINVAL;
>> > +                                     break;
>> > +                             }
>> > +
>> > +                             /*
>> > +                              * Skip over the actual register value as we
>> > +                              * expect that to differ.
>> > +                              */
>> > +                             dw += 2;
>> > +                             lri -= 2;
>> 
>> This makes me wonder if we could use this machinery post hang. Just to
>> get a little more triage data out, ie 'your context looks corrupted at
>> offset %x'...
>
> Certainly possible, but what we check here is _mostly_ the privileged
> registers that are not really meant to be changed by the user -- and we
> are only checking the offsets, so unlikely there to be just one wrong.
>
> The general principle was that we should provide raw information and
> have the smarts in userspace (so that we could always enhance our
> processing and reanalyse existing dumps). But at the end of the day,
> whatever allows us to prevent bugs or fix bugs is paramount.
>
> But I'm not yet sold this helps. Maybe if we find an example where it
> proves useful...
>
>> > +                     }
>> > +             } while ((lrc[dw] & ~BIT(0)) != MI_BATCH_BUFFER_END);
>> 
>> Ok, you tie up always the generate image. For future work add the hw batch
>> endpoint be a part of checker?
>
> It's not always in the first page, I'm not even sure if a BB_END is
> always included in the older gen. (I have a feeling the HW definitely
> started including it ~gen10.)
> -Chris


More information about the Intel-gfx mailing list