[Intel-gfx] [PATCH i-g-t] i915/gem_eio: Flush RCU before timing our own critical sections

Mon Nov 11 15:49:32 UTC 2019

On 11/11/2019 11:40, Chris Wilson wrote:
> We cannot control how long RCU takes to find a quiescent point as that
> depends upon the background load and so may take an arbitrary time.
> Instead, let's try to avoid that impacting our measurements by inserting
> an rcu_barrier() before our critical timing sections and hope that hides
> the issue, letting us always perform a fast reset. Fwiw, we do the
> expedited RCU synchronize, but that is not always enough.
> 
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> ---
>   tests/i915/gem_eio.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/tests/i915/gem_eio.c b/tests/i915/gem_eio.c
> index 8d6cb9760..49d2a99e9 100644
> --- a/tests/i915/gem_eio.c
> +++ b/tests/i915/gem_eio.c
> @@ -71,6 +71,7 @@ static void trigger_reset(int fd)
>   {
>   	struct timespec ts = { };
>   
> +	rcu_barrier(fd); /* flush any excess work before we start timing */
>   	igt_nsec_elapsed(&ts);
>   
>   	igt_kmsg(KMSG_DEBUG "Forcing GPU reset\n");
> @@ -227,6 +228,10 @@ static void hang_handler(union sigval arg)
>   	igt_debug("hang delay = %.2fus\n",
>   		  igt_nsec_elapsed(&ctx->delay) / 1000.0);
>   
> +	/* flush any excess work before we start timing our reset */
> +	igt_assert(igt_sysfs_printf(ctx->debugfs, "i915_drop_caches",
> +				    "%d", DROP_RCU));
> +
>   	igt_nsec_elapsed(ctx->ts);
>   	igt_assert(igt_sysfs_set(ctx->debugfs, "i915_wedged", "-1"));
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Avoid scoring demerit points if you add reference to bugzilla, 
presumably linking to CI results, showing this was known to be flaky. :)

Regards,

Tvrtko