[Intel-gfx] [igt-dev] [PATCH i-g-t v2 2/2] tests/gem_eio: Add reset and unwedge stress testing

Wed Apr 4 10:06:12 UTC 2018

Quoting Tvrtko Ursulin (2018-04-04 10:58:14)
> 
> On 03/04/2018 19:34, Antonio Argenziano wrote:
> > 
> > 
> > On 03/04/18 11:24, Antonio Argenziano wrote:
> >>
> >>
> >> On 03/04/18 04:36, Tvrtko Ursulin wrote:
> >>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>>
> >>> Reset and unwedge stress testing is supposed to trigger wedging or 
> >>> resets
> >>> at incovenient times and then re-use the context so either the 
> >>> context or
> >>> driver tracking might get confused and break.
> >>>
> >>> v2:
> >>>   * Renamed for more sensible naming.
> >>>   * Added some comments to explain what the test is doing. (Chris 
> >>> Wilson)
> >>>
> >>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>> ---
> >>>   tests/gem_eio.c | 74 
> >>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>   1 file changed, 74 insertions(+)
> >>>
> >>> diff --git a/tests/gem_eio.c b/tests/gem_eio.c
> >>> index b7c5047f0816..9599e73db736 100644
> >>> --- a/tests/gem_eio.c
> >>> +++ b/tests/gem_eio.c
> >>> @@ -591,6 +591,74 @@ static void test_inflight_internal(int fd, 
> >>> unsigned int wait)
> >>>       close(fd);
> >>>   }
> >>> +/*
> >>> + * Verify that we can submit and execute work after unwedging the GPU.
> >>> + */
> >>> +static void test_reset_stress(int fd, unsigned int flags)
> >>> +{
> >>> +    uint32_t ctx0 = gem_context_create(fd);
> >>> +
> >>> +    igt_until_timeout(5) {
> >>> +        struct drm_i915_gem_execbuffer2 execbuf = { };
> >>> +        struct drm_i915_gem_exec_object2 obj = { };
> >>> +        uint32_t bbe = MI_BATCH_BUFFER_END;
> >>> +        igt_spin_t *hang;
> >>> +        unsigned int i;
> >>> +        uint32_t ctx;
> >>> +
> >>> +        gem_quiescent_gpu(fd);
> >>> +
> >>> +        igt_require(i915_reset_control(flags & TEST_WEDGE ?
> >>> +                           false : true));
> >>> +
> >>> +        ctx = context_create_safe(fd);
> >>> +
> >>> +        /*
> >>> +         * Start executing a spin batch with some queued batches
> >>> +         * against a different context after it.
> >>> +         */
> >>
> >> Aren't all batches queued on ctx0? Or is this a reference to the check 
> >> on ctx you have later in the test.
> 
> Yes, a mistake in comment text.
> 
> >>> +        hang = spin_sync(fd, ctx0, 0);
> > 
> > I think you meant to send this^ on ctx.
> 
> Why do you think so? Did you find a different or better way to trigger 
> the bug this test is trying to hit?

You might need to explain that this test was trying to reproduce a
kernel bug around unwedging you found earlier, and instead managed to
find a similar one. ;)
-Chris