[Intel-gfx] [igt-dev] [PATCH i-g-t 2/3] tests/gem_eio: Speed up test execution

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Thu Mar 22 12:36:58 UTC 2018


On 22/03/2018 11:39, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-03-22 11:17:11)
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> If we stop relying on regular GPU hangs to be detected, but trigger them
>> manually as soon as we know our batch of interest is actually executing
>> on the GPU, we can dramatically speed up various subtests.
>>
>> This is enabled by the pollable spin batch added in the previous patch.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> Suggested-by: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: Antonio Argenziano <antonio.argenziano at intel.com>
>> ---
>> Note that the 'wait' subtest is mysteriously hanging for me in the no-op
>> batch send by gem_test_engines, but only on RCS engine. TBD while I am
>> getting some CI results.
>> ---
>>   lib.tar         | Bin 0 -> 102400 bytes
>>   tests/gem_eio.c |  97 ++++++++++++++++++++++++++++++++++++++++----------------
>>   2 files changed, 70 insertions(+), 27 deletions(-)
>>   create mode 100644 lib.tar
>>
>> diff --git a/lib.tar b/lib.tar
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..ea04fad219a87f2e975852989526f8da4c9b7d6d
>> GIT binary patch
>> literal 102400
>> zcmeHw>vkJQlBWMsPf=!{kwJ=gUAkMAJc3A2!kPrQASAWM<5LF&3M57#fW}3X+V;H9
> 
> Looks correct.

Just backing up in the cloud. :))

>> diff --git a/tests/gem_eio.c b/tests/gem_eio.c
>> index 4bcc5937db39..93400056124b 100644
>> --- a/tests/gem_eio.c
>> +++ b/tests/gem_eio.c
>> @@ -71,26 +71,23 @@ static void trigger_reset(int fd)
>>          gem_quiescent_gpu(fd);
>>   }
>>   
>> -static void wedge_gpu(int fd)
>> +static void manual_hang(int drm_fd)
>>   {
>> -       /* First idle the GPU then disable GPU resets before injecting a hang */
>> -       gem_quiescent_gpu(fd);
>> -
>> -       igt_require(i915_reset_control(false));
>> +       int dir = igt_debugfs_dir(drm_fd);
>>   
>> -       igt_debug("Wedging GPU by injecting hang\n");
>> -       igt_post_hang_ring(fd, igt_hang_ring(fd, I915_EXEC_DEFAULT));
>> +       igt_sysfs_set(dir, "i915_wedged", "-1");
>>   
>> -       igt_assert(i915_reset_control(true));
>> +       close(dir);
>>   }
> 
> Ok.
> 
>> -static void wedgeme(int drm_fd)
>> +static void wedge_gpu(int fd)
>>   {
>> -       int dir = igt_debugfs_dir(drm_fd);
>> -
>> -       igt_sysfs_set(dir, "i915_wedged", "-1");
>> +       /* First idle the GPU then disable GPU resets before injecting a hang */
>> +       gem_quiescent_gpu(fd);
>>   
>> -       close(dir);
>> +       igt_require(i915_reset_control(false));
>> +       manual_hang(fd);
>> +       igt_assert(i915_reset_control(true));
>>   }
> 
> Ok.

Well done reading that awful diff!

>>   
>>   static int __gem_throttle(int fd)
>> @@ -149,29 +146,66 @@ static int __gem_wait(int fd, uint32_t handle, int64_t timeout)
>>          return err;
>>   }
>>   
>> +static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long flags)
>> +{
>> +       if (gem_can_store_dword(fd, flags))
>> +               return __igt_spin_batch_new_poll(fd, ctx, flags);
>> +       else
>> +               return __igt_spin_batch_new(fd, ctx, flags, 0);
>> +}
>> +
>> +static void __spin_wait(int fd, igt_spin_t *spin)
>> +{
>> +       if (spin->running) {
>> +               while (!*((volatile bool *)spin->running))
>> +                       ;
>> +       } else {
>> +               igt_debug("__spin_wait - usleep mode\n");
>> +               usleep(500e3); /* Better than nothing! */
>> +       }
>> +}
>> +
>> +/*
>> + * Wedge the GPU when we know our batch is running.
>> + */
>> +static void wedge_after_running(int fd, igt_spin_t *spin)
>> +{
>> +       __spin_wait(fd, spin);
>> +       manual_hang(fd);
>> +}
>> +
>>   static void test_wait(int fd)
>>   {
>> -       igt_hang_t hang;
>> +       struct timespec ts = { };
>> +       igt_spin_t *hang;
>>   
>>          igt_require_gem(fd);
>>   
>> +       igt_nsec_elapsed(&ts);
>> +
>>          /* If the request we wait on completes due to a hang (even for
>>           * that request), the user expects the return value to 0 (success).
>>           */
>> -       hang = igt_hang_ring(fd, I915_EXEC_DEFAULT);
>> -       igt_assert_eq(__gem_wait(fd, hang.handle, -1), 0);
>> -       igt_post_hang_ring(fd, hang);
>> +       igt_require(i915_reset_control(true));
>> +       hang = __spin_poll(fd, 0, I915_EXEC_DEFAULT);
>> +       wedge_after_running(fd, hang);
>> +       igt_assert_eq(__gem_wait(fd, hang->handle, -1), 0);
>> +       igt_spin_batch_free(fd, hang);
> 
>>   
>>          /* If the GPU is wedged during the wait, again we expect the return
>>           * value to be 0 (success).
>>           */
>>          igt_require(i915_reset_control(false));
>> -       hang = igt_hang_ring(fd, I915_EXEC_DEFAULT);
>> -       igt_assert_eq(__gem_wait(fd, hang.handle, -1), 0);
>> -       igt_post_hang_ring(fd, hang);
>> +       hang = __spin_poll(fd, 0, I915_EXEC_DEFAULT);
>> +       wedge_after_running(fd, hang);
>> +       igt_assert_eq(__gem_wait(fd, hang->handle, -1), 0);
>> +       igt_spin_batch_free(fd, hang);
>>          igt_require(i915_reset_control(true));
> 
> Hmm. These are not equivalent to the original test. The tests
> requires hangcheck to kick in while the test is blocked on igt_wait.
> To do a fast equivalent, we need to kick off a timer. (Here we are just
> asking if a wait on an already completed request doesn't block, not how
> we handle the reset in the middle of a wait. Seems a reasonable addition
> though.)
> 
> I think that's a general pattern worth repeating for the rest of tests:
> don't immediately inject the hang, but leave it a few milliseconds to
> allow us to block on the subsequent wait. I would even repeat the tests
> a few times with different timeouts; 0, 1us, 10ms (thinking of the
> different phases for i915_request_wait).

True, it's not the same.

Makes sense to test with different timeouts. Will do.

> 
>>          trigger_reset(fd);
>> +
>> +       /* HACK for CI */
>> +       igt_assert(igt_nsec_elapsed(&ts) < 5e9);
> 
> igt_seconds_elapsed() the approximation is worth the readability.
> 
> In this case you might like to try igt_set_timeout(), as I think each
> subtest and exithandlers are in place to make them robust against
> premature failures.

Well this was just to see that will happen on the shards here. As 
mentioned in the commit I get that yet unexplained GPU hang at subtest 
exit here. So the assert above is just to notice if the same happens on 
shards.

Regards,

Tvrtko


More information about the Intel-gfx mailing list