[Intel-gfx] [CI 1/5] igt/gem_workarounds: Read the workaround registers from the active context

Wed Oct 4 13:53:17 UTC 2017

Chris Wilson <chris at chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2017-10-03 16:19:10)
>> Chris Wilson <chris at chris-wilson.co.uk> writes:
>> 
>> > The workarounds are only valid whilst the GPU is active. To be sure we
>> > are reading the registers in the right state, issue the reads from the GPU.
>> >
>> 
>> Yay, this is the right way :)
>> 
>> Some comments and findings below...
>> 
>> > v2: Show ignored write-only failures as debug.
>> >
>> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> > ---
>> >  tests/gem_workarounds.c | 147 ++++++++++++++++++++++++++----------------------
>> >  1 file changed, 81 insertions(+), 66 deletions(-)
>> >
>> > diff --git a/tests/gem_workarounds.c b/tests/gem_workarounds.c
>> > index 5e30a7b8..95ec250a 100644
>> > --- a/tests/gem_workarounds.c
>> > +++ b/tests/gem_workarounds.c
>> > @@ -61,20 +61,6 @@ static struct write_only_list {
>> >  static struct intel_wa_reg *wa_regs;
>> >  static int num_wa_regs;
>> >  
>> > -static void wait_gpu(void)
>> > -{
>> > -     int fd = drm_open_driver(DRIVER_INTEL);
>> > -     gem_quiescent_gpu(fd);
>> > -     close(fd);
>> > -}
>> > -
>> > -static void test_hang_gpu(void)
>> > -{
>> > -     int fd = drm_open_driver(DRIVER_INTEL);
>> > -     igt_post_hang_ring(fd, igt_hang_ring(fd, I915_EXEC_DEFAULT));
>> > -     close(fd);
>> > -}
>> > -
>> >  static void test_suspend_resume(void)
>> >  {
>> >       igt_info("Suspending the device ...\n");
>> > @@ -96,49 +82,95 @@ static bool write_only(const uint32_t addr)
>> >       return false;
>> >  }
>> >  
>> > -static int workaround_fail_count(void)
>> > -{
>> > -     int i, fail_count = 0;
>> > -
>> > -     /* There is a small delay after coming ot of rc6 to the correct
>> > -        render context values will get loaded by hardware (bdw,chv).
>> > -        This here ensures that we have the correct context loaded before
>> > -        we start to read values */
>> > -     wait_gpu();
>> > +#define MI_STORE_REGISTER_MEM (0x24 << 23)
>> >  
>> > -     igt_debug("Address    val        mask        read        result\n");
>> > +static int workaround_fail_count(int fd)
>> > +{
>> > +     struct drm_i915_gem_exec_object2 obj[2];
>> > +     struct drm_i915_gem_relocation_entry *reloc;
>> > +     struct drm_i915_gem_execbuffer2 execbuf;
>> > +     uint32_t result_sz, batch_sz;
>> > +     uint32_t *base, *out;
>> > +     int fail_count = 0;
>> > +
>> > +     reloc = calloc(num_wa_regs, sizeof(*reloc));
>> > +     igt_assert(reloc);
>> > +
>> > +     result_sz = 4 * num_wa_regs;
>> > +     result_sz = (result_sz + 4095) & -4096;
>> 
>> Macro for align?
>
> I never remember if we have PAGE_ALIGN() or not. Quicker to write than
> grep.
>
>> Further, why do even need it. For
>> what I can gather, the mapping should work for smaller
>> objects also.
>
> Our mmap interfaces we like to operate on pages and tend to complain for
> some interfaces if not, or not whole object. So it is just simpler to
> think in pages and not worry about which work on less.
>
>> > -static void check_workarounds(enum operation op)
>> > +static void check_workarounds(int fd, enum operation op)
>> >  {
>> > -     igt_assert_eq(workaround_fail_count(), 0);
>> > +     igt_assert_eq(workaround_fail_count(fd), 0);
>> >  
>> >       switch (op) {
>> >       case GPU_RESET:
>> > -             test_hang_gpu();
>> > +             igt_force_gpu_reset(fd);
>> 
>> My kbl fails with the tests as you need some mechanism
>> to wait that the reset really did happen?
>
> It waits for the reset to complete (double checked, otherwise we have a
> number of nasty races around).
>
>> Hmm the kernel should ensure that the next reading batch
>> is post reset and everything should be fine.
>> 
>> (gem_workarounds:7286) WARNING: 0x024D0 0x00002248      0xFFFFFFFF      0x00002094      FAIL
>> (gem_workarounds:7286) WARNING: 0x024D4 0x00002580      0xFFFFFFFF      0x00002094      FAIL
>> (gem_workarounds:7286) WARNING: 0x024D8 0x00007304      0xFFFFFFFF      0x00002094      FAIL
>
> These are RING_FORCE_TO_NONPRIV, hence the thread about moving them from
> the WA_WRITE to I915_WRITE as they are not part of the context image.
>
> https://patchwork.freedesktop.org/series/31099/

Bikescheds aside, NONPRIV and the MMCD_MISC_CTRL changes are indeed
needed, which this more precise method found out.

Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>