[igt-dev] [PATCH i-g-t v2] tests/gem_watchdog: Initial set of tests for GPU watchdog
Antonio Argenziano
antonio.argenziano at intel.com
Mon Oct 8 21:43:39 UTC 2018
On 05/10/18 18:05, Carlos Santa wrote:
> This test adds basic set of tests to reset the different
> GPU engines through the watchdog timer.
>
> Credits to Antonio for the original codebase this is based on.
>
> This was verified on SKL/ULT GT3:
>
> $./gem_watchdog --run-subtest basic-vecs0
> IGT-Version: 1.23-gaaeb2007206d (x86_64) (Linux: 4.18.0-rc7+ x86_64)
> Starting subtest: basic-vecs0
> Subtest basic-vecs0: SUCCESS (2.402s)
> $ sudo cat /sys/kernel/debug/dri/0/i915_reset_info
> full gpu reset = 0
> GuC watchdog/media reset = 0
> rcs0 = 0
> bcs0 = 0
> vcs0 = 0
> vcs1 = 0
> vecs0 = 1
>
> v2: (Review comments from Chris Wilson)
> * Replace send_canary() by timestamps before/after the hang
> and measure dt. Use dt < 2*threshold + reset + submission
> to check watchdog vs hangcheck
> * Initialize drm_i915_gem_context_param args only once at
> the struct declaration
> * Avoid using MAX_ENGINES implicitly to declare engines_thresholds
> array
> * Remove unnecessary igt_assert(!check_error_state(fd))
> * Use the class:instance interface when looping through the engines
>
> (Review by Petri Latvala)
> * Update the correct patch's year timestamp
> * Include IGT_DESCRIPTION() label
>
> Cc: Antonio Argenziano <antonio.argenziano at intel.com>
> Cc: Michel Thierry <michel.thierry at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Signed-off-by: Carlos Santa <carlos.santa at intel.com>
> ---
> tests/Makefile.sources | 1 +
> tests/gem_watchdog.c | 186 +++++++++++++++++++++++++++++++++++++++++++++++++
> tests/meson.build | 1 +
> 3 files changed, 188 insertions(+)
> create mode 100644 tests/gem_watchdog.c
>
> +
> +static void media_hang_simple(int fd, const struct intel_execution_engine2 *e)
> +{
> + uint32_t ctx;
> + unsigned flags = HANG_ALLOW_CAPTURE;
> + struct timeval start, end;
> + double dt_msecs;
> +
> + /* Submit on default context */
> + ctx = 0;
> + context_set_watchdog(fd, e, ctx, WATCHDOG_THRESHOLD);
> +
> + clear_error_state(fd);
> +
> + gettimeofday(&start, NULL);
> + inject_hang(fd, ctx, e, flags);
> + gettimeofday(&end, NULL);
> + dt_msecs = elapsed(&start, &end)/1000;
> +
> + /* reset time for watchdog should be less than 2*threshold + engine reset time + submission */
> + igt_assert(dt_msecs < 2*WATCHDOG_THRESHOLD + 15);
> +
> + /* Assert if error state is not clean */
> + igt_assert(!check_error_state(fd));
If you have a small enough threshold you don't need the dmesg check for
resets. IMO, ideally there would be a way to have the hang_detector
running but that doesn't work because it would disable watchdog as well.
> +}
> +
> +igt_main
> +{
> + int fd = -1;
> +
> + igt_skip_on_simulation();
> +
> + igt_fixture {
> + fd = drm_open_driver(DRIVER_INTEL);
> + igt_require_gem(fd);
> + }
> +
> + for (const struct intel_execution_engine2 *e = intel_execution_engines2; e->name; e++) {
> + igt_subtest_group {
> + igt_fixture {
> + gem_require_engine(fd, e->class, e->instance);
Move the require inside the subtest to maintain a consistent test list
across platforms.
Thanks,
Antonio
> + }
> +
> + /* default exec-id is purely symbolic */
> + if (strcmp(e->name, "bcs0") == 0)
> + continue;
> +
> + igt_subtest_f("basic-%s", e->name) {
> + media_hang_simple(fd, e);
> + }
> + }
> + }
> +
> + igt_fixture {
> + close(fd);
> + }
> +}
> diff --git a/tests/meson.build b/tests/meson.build
> index 17deb945ec95..3b864d891a08 100644
> --- a/tests/meson.build
> +++ b/tests/meson.build
> @@ -130,6 +130,7 @@ test_progs = [
> 'gem_unref_active_buffers',
> 'gem_userptr_blits',
> 'gem_wait',
> + 'gem_watchdog',
> 'gem_workarounds',
> 'gem_write_read_ring_switch',
> 'gen3_mixed_blits',
>
More information about the igt-dev
mailing list