[Intel-gfx] [PATCH i-g-t] tests/gem_reset_stats.c: fix "ban" tests with scheduler

Mon Jul 13 02:35:04 PDT 2015

On Fri, Jul 10, 2015 at 02:26:59PM +0100, tim.gore at intel.com wrote:
> From: Tim Gore <tim.gore at intel.com>
> 
> The tests for context banning fail when the gpu scheduler
> is enabled. The test causes a hang (using an infinite loop
> batch) and then queues up some work behind it on both the
> hanging context and also on a second "good" context. On
> the "good" context it queues up 2 batch buffers. After the
> hanging ring has been reset (not a full gpu reset) the
> test checks the values of batch_active and batch_pending
> returned by the i915_get_reset_stats_ioctl. For the "good"
> context it expects to see batch_pending == 2, because two
> batch buffers we queued up behind the hang on this
> context. But, with the scheduler enabled (android, gen8),
> one of these batch buffers is still waiting in the
> scheduler and has not made it as far as the
> ring->request_list, so this batch buffer is unaffected by
> the ring reset, and batch_pending is only 1.
> 
> I considered putting in a test for the scheduler being
> enabled, but decided that a simpler solution is to only
> queue up 1 batch buffer on the good context. This does
> not change the test logic in any way and ensures that we
> should always have batch_pending=1, with or without the
> scheduler.

For the scheduler/tdr we probably need to split this up into two testcases
each:
- one where the 2nd batch depends upon the first (cross-context depency).
- one where the 2nd batch doesn't depend upon the first (should execute
  unhampered with scheduler/tdr).

Since we don't yet have a scheduler/tdr both of these will result in the
same outcome (since there's always the temporal depency). But with your
patch you only test the 2nd case (and incompletely, we should assert that
the 2nd batch did run) and ignore the first case.

In short this failure here is telling you that your test coverage for
these features is lacking. The correct fix is not to mangle the existing,
but fix it up to correctly cover the new expectations in all cases. And
that should be done as part of the tdr/scheduler submission.
-Daniel

> 
> Signed-off-by: Tim Gore <tim.gore at intel.com>
> ---
>  tests/gem_reset_stats.c | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/tests/gem_reset_stats.c b/tests/gem_reset_stats.c
> index 2bb4291..6529463 100644
> --- a/tests/gem_reset_stats.c
> +++ b/tests/gem_reset_stats.c
> @@ -468,7 +468,7 @@ static void test_rs_ctx(int num_fds, int num_ctx, int hang_index,
>  
>  static void test_ban(void)
>  {
> -	int h1,h2,h3,h4,h5,h6,h7;
> +	int h1,h2,h3,h4,h5,h6;
>  	int fd_bad, fd_good;
>  	int retry = 10;
>  	int active_count = 0, pending_count = 0;
> @@ -496,7 +496,6 @@ static void test_ban(void)
>  	pending_count++;
>  
>  	h6 = exec_valid(fd_good, 0);
> -	h7 = exec_valid(fd_good, 0);
>  
>          while (retry--) {
>                  h3 = inject_hang_no_ban_error(fd_bad, 0);
> @@ -525,7 +524,7 @@ static void test_ban(void)
>  	igt_assert_eq(h4, -EIO);
>  	assert_reset_status(fd_bad, 0, RS_BATCH_ACTIVE);
>  
> -	gem_sync(fd_good, h7);
> +	gem_sync(fd_good, h6);
>  	assert_reset_status(fd_good, 0, RS_BATCH_PENDING);
>  
>  	igt_assert_eq(gem_reset_stats(fd_good, 0, &rs_good), 0);
> @@ -534,12 +533,11 @@ static void test_ban(void)
>  	igt_assert(rs_bad.batch_active == active_count);
>  	igt_assert(rs_bad.batch_pending == pending_count);
>  	igt_assert(rs_good.batch_active == 0);
> -	igt_assert(rs_good.batch_pending == 2);
> +	igt_assert(rs_good.batch_pending == 1);
>  
>  	gem_close(fd_bad, h1);
>  	gem_close(fd_bad, h2);
>  	gem_close(fd_good, h6);
> -	gem_close(fd_good, h7);
>  
>  	h1 = exec_valid(fd_good, 0);
>  	igt_assert_lte(0, h1);
> @@ -554,7 +552,7 @@ static void test_ban(void)
>  
>  static void test_ban_ctx(void)
>  {
> -	int h1,h2,h3,h4,h5,h6,h7;
> +	int h1,h2,h3,h4,h5,h6;
>  	int ctx_good, ctx_bad;
>  	int fd;
>  	int retry = 10;
> @@ -587,7 +585,6 @@ static void test_ban_ctx(void)
>  	pending_count++;
>  
>  	h6 = exec_valid(fd, ctx_good);
> -	h7 = exec_valid(fd, ctx_good);
>  
>          while (retry--) {
>                  h3 = inject_hang_no_ban_error(fd, ctx_bad);
> @@ -616,7 +613,7 @@ static void test_ban_ctx(void)
>  	igt_assert_eq(h4, -EIO);
>  	assert_reset_status(fd, ctx_bad, RS_BATCH_ACTIVE);
>  
> -	gem_sync(fd, h7);
> +	gem_sync(fd, h6);
>  	assert_reset_status(fd, ctx_good, RS_BATCH_PENDING);
>  
>  	igt_assert_eq(gem_reset_stats(fd, ctx_good, &rs_good), 0);
> @@ -625,12 +622,11 @@ static void test_ban_ctx(void)
>  	igt_assert(rs_bad.batch_active == active_count);
>  	igt_assert(rs_bad.batch_pending == pending_count);
>  	igt_assert(rs_good.batch_active == 0);
> -	igt_assert(rs_good.batch_pending == 2);
> +	igt_assert(rs_good.batch_pending == 1);
>  
>  	gem_close(fd, h1);
>  	gem_close(fd, h2);
>  	gem_close(fd, h6);
> -	gem_close(fd, h7);
>  
>  	h1 = exec_valid(fd, ctx_good);
>  	igt_assert_lte(0, h1);
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch