[Intel-gfx] [PATCH] drm/i915/selftests: Increase timeout in requests perf selftest
Thomas Hellström
thomas.hellstrom at linux.intel.com
Thu Oct 21 05:36:43 UTC 2021
On Wed, 2021-10-20 at 13:34 -0700, John Harrison wrote:
> On 10/11/2021 10:57, Matthew Brost wrote:
> > perf_parallel_engines is micro benchmark to test i915 request
> > scheduling. The test creates a thread per physical engine and
> > submits
> > NOP requests and waits the requests to complete in a loop. In
> > execlists
> > mode this works perfectly fine as powerful CPU has enough cores to
> > feed
> > each engine and process the CSBs. With GuC submission the uC gets
> > overwhelmed as all threads feed into a single CTB channel and the
> > GuC
> > gets bombarded with CSBs as contexts are immediately switched in
> > and out
> > on the engines due to the zero runtime of the requests. When the
> > GuC is
> > overwhelmed scheduling of contexts is unfair due to the nature of
> > the
> > GuC scheduling algorithm. This behavior is understood and deemed
> > acceptable as this micro benchmark isn't close to real world use
> > case.
> > Increasing the timeout of wait period for requests to complete.
> > This
> > makes the test understand that is ok for contexts to get starved in
> > this
> > scenario.
> >
> > A future patch / cleanup may just delete these micro benchmark
> > tests as
> > they basically mean nothing. We care about real workloads not made
> > up
> > ones.
> >
> > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> Reviewed-by: John Harrison <John.C.Harrison at Intel.com>
Also
Reviewed-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
I think one important thing to keep in mind here is that this selftest
actually *did* find a flaw, Albeit it upon analysis turned out not to
be significant.
But given that, user-space clients like, for example, gem_exec_suspend
seems to be able to trigger similar delays as well at least to some
extend with a huge amount of small requests submitted from user-space
we shold probably verify at some point that this isn't exploitable by a
malicious client starving other clients on the same system.
/Thomas
>
> > ---
> > drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c
> > b/drivers/gpu/drm/i915/selftests/i915_request.c
> > index d67710d10615..6496671a113c 100644
> > --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> > +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> > @@ -2805,7 +2805,7 @@ static int p_sync0(void *arg)
> > i915_request_add(rq);
> >
> > err = 0;
> > - if (i915_request_wait(rq, 0, HZ / 5) < 0)
> > + if (i915_request_wait(rq, 0, HZ) < 0)
> > err = -ETIME;
> > i915_request_put(rq);
> > if (err)
> > @@ -2876,7 +2876,7 @@ static int p_sync1(void *arg)
> > i915_request_add(rq);
> >
> > err = 0;
> > - if (prev && i915_request_wait(prev, 0, HZ / 5) < 0)
> > + if (prev && i915_request_wait(prev, 0, HZ) < 0)
> > err = -ETIME;
> > i915_request_put(prev);
> > prev = rq;
>
More information about the dri-devel
mailing list