[igt-dev] [PATCH i-g-t v2] tests/gem_watchdog: Initial set of tests for GPU watchdog

Antonio Argenziano antonio.argenziano at intel.com
Mon Oct 8 21:43:39 UTC 2018



On 05/10/18 18:05, Carlos Santa wrote:
> This test adds basic set of tests to reset the different
> GPU engines through the watchdog timer.
> 
> Credits to Antonio for the original codebase this is based on.
> 
> This was verified on SKL/ULT GT3:
> 
> $./gem_watchdog --run-subtest basic-vecs0
> IGT-Version: 1.23-gaaeb2007206d (x86_64) (Linux: 4.18.0-rc7+ x86_64)
> Starting subtest: basic-vecs0
> Subtest basic-vecs0: SUCCESS (2.402s)
> $ sudo cat /sys/kernel/debug/dri/0/i915_reset_info
> full gpu reset = 0
> GuC watchdog/media reset = 0
> rcs0 = 0
> bcs0 = 0
> vcs0 = 0
> vcs1 = 0
> vecs0 = 1
> 
> v2: (Review comments from Chris Wilson)
>   * Replace send_canary() by timestamps before/after the hang
>     and measure dt. Use dt < 2*threshold + reset + submission
>     to check watchdog vs hangcheck
>   * Initialize drm_i915_gem_context_param args only once at
>     the struct declaration
>   * Avoid using MAX_ENGINES implicitly to declare engines_thresholds
>     array
>   * Remove unnecessary igt_assert(!check_error_state(fd))
>   * Use the class:instance interface when looping through the engines
> 
>    (Review by Petri Latvala)
>   * Update the correct patch's year timestamp
>   * Include IGT_DESCRIPTION() label
> 
> Cc: Antonio Argenziano <antonio.argenziano at intel.com>
> Cc: Michel Thierry <michel.thierry at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Signed-off-by: Carlos Santa <carlos.santa at intel.com>
> ---
>   tests/Makefile.sources |   1 +
>   tests/gem_watchdog.c   | 186 +++++++++++++++++++++++++++++++++++++++++++++++++
>   tests/meson.build      |   1 +
>   3 files changed, 188 insertions(+)
>   create mode 100644 tests/gem_watchdog.c
> 


> +
> +static void media_hang_simple(int fd, const struct intel_execution_engine2 *e)
> +{
> +	uint32_t ctx;
> +	unsigned flags = HANG_ALLOW_CAPTURE;
> +	struct timeval start, end;
> +	double dt_msecs;
> +
> +	/* Submit on default context */
> +	ctx = 0;
> +	context_set_watchdog(fd, e, ctx, WATCHDOG_THRESHOLD);
> +
> +	clear_error_state(fd);
> +
> +	gettimeofday(&start, NULL);
> +	inject_hang(fd, ctx, e, flags);
> +	gettimeofday(&end, NULL);
> +	dt_msecs = elapsed(&start, &end)/1000;
> +
> +	/* reset time for watchdog should be less than 2*threshold + engine reset time + submission */
> +	igt_assert(dt_msecs < 2*WATCHDOG_THRESHOLD + 15);
> +
> +	/* Assert if error state is not clean */
> +	igt_assert(!check_error_state(fd));

If you have a small enough threshold you don't need the dmesg check for 
resets. IMO, ideally there would be a way to have the hang_detector 
running but that doesn't work because it would disable watchdog as well.

> +}
> +
> +igt_main
> +{
> +	int fd = -1;
> +
> +	igt_skip_on_simulation();
> +
> +	igt_fixture {
> +		fd = drm_open_driver(DRIVER_INTEL);
> +		igt_require_gem(fd);
> +	}
> +
> +	for (const struct intel_execution_engine2 *e = intel_execution_engines2; e->name; e++) {
> +		igt_subtest_group {
> +			igt_fixture {
> +				gem_require_engine(fd, e->class, e->instance);

Move the require inside the subtest to maintain a consistent test list 
across platforms.

Thanks,
Antonio

> +			}
> +
> +			/* default exec-id is purely symbolic */
> +			if (strcmp(e->name, "bcs0") == 0)
> +				continue;
> +
> +			igt_subtest_f("basic-%s", e->name) {
> +				media_hang_simple(fd, e);
> +			}
> +		}
> +	}
> +
> +	igt_fixture {
> +		close(fd);
> +	}
> +}
> diff --git a/tests/meson.build b/tests/meson.build
> index 17deb945ec95..3b864d891a08 100644
> --- a/tests/meson.build
> +++ b/tests/meson.build
> @@ -130,6 +130,7 @@ test_progs = [
>   	'gem_unref_active_buffers',
>   	'gem_userptr_blits',
>   	'gem_wait',
> +        'gem_watchdog',
>   	'gem_workarounds',
>   	'gem_write_read_ring_switch',
>   	'gen3_mixed_blits',
> 


More information about the igt-dev mailing list