[Intel-gfx] [igt-dev] [PATCH i-g-t v13 2/2] tests/gem_ctx_sseu: Dynamic (sub)slice programming tests

Mon Sep 17 10:38:39 UTC 2018

Quoting Tvrtko Ursulin (2018-09-17 10:33:47)
> 
> On 14/09/2018 17:17, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-09-14 17:04:38)
> >> +static igt_spin_t *
> >> +__post_set(int fd, unsigned int flags, uint32_t ctx, igt_spin_t *spin,
> >> +          unsigned int expected)
> >> +{
> >> +       bool busy_reset = (flags & TEST_BUSY) &&
> >> +                         (flags & (TEST_RESET | TEST_HANG));
> >> +
> >> +       if (flags & TEST_RESET)
> >> +               igt_force_gpu_reset(fd);
> >> +       else if (flags & TEST_HANG)
> >> +               gem_sync(fd, spin->handle);
> > 
> > Hmm, in both cases this is before we submit the read/post-sseu batch.
> > 
> > I was thinking more for the hang you would simply opt to not terminate
> > the spinner.
> 
> Okay I can do that as well. Hang will then be processed while we try to 
> pread the rpcs eb result so I think it works.
> 
> > 
> > Being utterly pedantic, we may want to check per-engine resets and
> > device level.
> 
> Isn't the per-engine default, so the hang test flavour would trigger it? 
> And to explicitly ask for whole device we have some debugfs api for it?

i915.reset=2 (default if supported) per-engine
i915.reset=1 global
i915.reset=0 none, will wedge on hang.

Alas no temporary override via debugfs yet. Be my guest :)

> >> +       if (spin && !busy_reset) {
> >> +               igt_assert_eq(read_slice_count_busy(fd, ctx, 0, spin),
> >> +                             expected);
> >> +       } else {
> >> +               igt_assert_eq(read_slice_count(fd, ctx, 0), expected);
> >> +       }
> >> +
> >> +       igt_assert_eq(read_slice_count(fd, 0, 0), __slice_count__);
> >> +
> >> +       if (spin)
> >> +               igt_spin_batch_free(fd, spin);
> >> +
> >> +       if (flags & TEST_IDLE)
> >> +               igt_drop_caches_set(fd, DROP_RETIRE | DROP_IDLE | DROP_ACTIVE);
> >> +
> >> +       if (flags & TEST_SUSPEND) {
> >> +               igt_set_autoresume_delay(5);
> > 
> > Brave. Opting for a faster wakeup than default...
> 
> I think I started with copy paste and then lowered it when testing the 
> test. Is the right thing to omit specifying it?

Yes. The default (15s) should be the conservative minimal required for
all machines to be sure we don't miss the wakeup (or it trigger at the
wrong time and cause -EBUSY). There are a few tests where we know
(e.g. we're waiting on hangcheck) that 15s is not enough.

We might suggest to Tomi to do some trial and error and see if we can
stably reduce it across the board.

> >> +               igt_system_suspend_autoresume(SUSPEND_STATE_MEM,
> >> +                                             SUSPEND_TEST_NONE);
> >> +       }
> >> +
> >> +       igt_assert_eq(read_slice_count(fd, ctx, 0), expected);
> >> +       igt_assert_eq(read_slice_count(fd, 0, 0), __slice_count__);
> >> +
> >> +       return NULL;
> >> +}
> > 
> > Overall, nothing else strikes me as being absent.
> > 
> > Oh, one more thing; do we are what state we leave the system in on
> > module unload? So long as the GPU reset clears it we should be ok.
> 
> I don't think it applies when device is off. And on load running the 
> kernel context restores the default full enablement.

That's not strictly true... On module load the kernel context will be
marked with RESTORE_INHIBIT as it is created before the default_state
is. It's the next context to be used that will have the full setup.

I think the reset is good enough as the defaults should be sane (or we
have a w/a to set up sane register state), and once we start using the
device proper all is specified. Needless worry on my part.
-Chris