[Intel-gfx] [PATCH i-g-t v2] tests/kms_frontbuffer_tracking: increase FBC wait timeout to 5s

Mon Sep 4 10:45:35 UTC 2017

Quoting Paulo Zanoni (2017-09-01 20:12:01)
> Em Sex, 2017-08-25 às 14:11 +0100, Chris Wilson escreveu:
> > Quoting Lofstedt, Marta (2017-08-25 13:50:16)
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: Lofstedt, Marta
> > > > Sent: Friday, August 25, 2017 2:54 PM
> > > > To: 'Chris Wilson' <chris at chris-wilson.co.uk>; intel-gfx at lists.fr
> > > > eedesktop.org
> > > > Subject: RE: [Intel-gfx] [PATCH i-g-t v2]
> > > > tests/kms_frontbuffer_tracking:
> > > > increase FBC wait timeout to 5s
> > > > 
> > > > 
> > > > 
> > > > > -----Original Message-----
> > > > > From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> > > > > Sent: Friday, August 25, 2017 1:47 PM
> > > > > To: Lofstedt, Marta <marta.lofstedt at intel.com>; intel-
> > > > > gfx at lists.freedesktop.org
> > > > > Subject: Re: [Intel-gfx] [PATCH i-g-t v2]
> > > > > tests/kms_frontbuffer_tracking:
> > > > > increase FBC wait timeout to 5s
> > > > > 
> > > > > Quoting Marta Lofstedt (2017-08-25 11:40:29)
> > > > > > From: "Lofstedt, Marta" <marta.lofstedt at intel.com>
> > > > > > 
> > > > > > The subtests: igt at kms_frontbuffer_tracking@fbc-*draw*
> > > > > > has non-consistent results, pending between fail and pass.
> > > > > > The fails are always due to "FBC disabled".
> > > > > > With this increase in timeout the flip-flop behavior is no
> > > > > > longer
> > > > > > reproducible.
> > > > > > 
> > > > > > This is a partial revert of:
> > > > > > 64590c7b768dc8d8dd962f812d5ff5a39e7e8b54,
> > > > > > where the timeout was decreased from 5s to 2s.
> > > > > > After investigating the timeout needed, the conclusion is
> > > > > > that the
> > > > > > longer timeout is only needed when the test swaps between
> > > > > > some
> > > > > > specific draw domains, typically blt vs. mmap_cpu.
> > > > > > The objective of the FBC part of the tests is not to
> > > > > > benchmark draw
> > > > > > domain changes, it is to check that FBC was (re-)enabled.
> > > > > > 
> > > > > > V2: Added documentation
> > > > > > 
> > > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101623
> > > > > > Signed-off-by: Marta Lofstedt <marta.lofstedt at intel.com>
> > > > > > Acked-by: Paulo Zanoni <paulo.r.zanoni at intel.com>
> > > > > > ---
> > > > > >  tests/kms_frontbuffer_tracking.c | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/tests/kms_frontbuffer_tracking.c
> > > > > > b/tests/kms_frontbuffer_tracking.c
> > > > > > index e03524f1..2538450c 100644
> > > > > > --- a/tests/kms_frontbuffer_tracking.c
> > > > > > +++ b/tests/kms_frontbuffer_tracking.c
> > > > > > @@ -924,7 +924,7 @@ static bool
> > > > > > fbc_stride_not_supported(void)
> > > > > > 
> > > > > >  static bool fbc_wait_until_enabled(void)  {
> > > > > 
> > > > > Try igt_drop_caches_set(device, DROP_RETIRE); instead of
> > > > > relaxing the
> > > > > timeout.
> > > > > -Chris
> > > > 
> > > > OK, I will test that and do a V3 if it works!
> > > > /Marta
> > > 
> > > I did some initial testing with igt_drop_caches_set inside
> > > fbc_wait_until_enabled and it looks good, I will add this to my
> > > weekend tests to get more results. This also appear to improve the
> > > runtime of the tests quite a bit. So, maybe the igt_drop_caches_set
> > > should be placed somewhere else so it will give runtime
> > > improvements not only for the FBC related sub-tests.
> > 
> > Sure, all the waits can do with the retire first, give it a common
> > function and a comment for the rationale (which should pretty much
> > the
> > same as given in the changelog). 
> 
> We can do that, sure, especially if it makes the tests faster...
> 
> > Anytime we use the GPU to invalidate
> > the frontbuffer tracking, we have to wait for a retire to do the
> > flush.
> > Retirement is lazy, and is normally driven by GPU activity but we
> > have a
> > background kworker to make sure we notice when the system becomes
> > idle
> > independent of userspace - except it's low frequency.
> 
> ... but our current 2s timeout should have been enough for that,
> shouldn't it? If I'm looking at the right part of the code, retirement
> should be once per second, so 2s should have been enough. But it looks
> like it's not enough
> 
> Unless I'm misinterpreting the round_up part, which could convert the
> 1s to 2s, which would still probably be fine...

It can bump the wait by upto a second (it tries to align wakeups on
second boundaries). And we may skip the work if the device is busy
elsewhere.

> Anyway, 3s looks like as definitely safe even in this case. Maybe we
> could go with 3s?
> 
> We can both increase the timeout *and* do cache dropping. Although I
> think not doing the cache dropping is definitely something that needs
> to be tested, so doing the cache dropping every time may not be a good
> idea.

You are not dropping the caches, it is just doing a retire.

The real question is what is the expectation? If we want the test to
simply state that when ready FBC et al will be re-enabled, then just add
a synchronous debugfs that establishes the condition in the driver that
FBC should be ready (atm that is DROP_RETIRE, but you will probably want
a better specified knob). If the test is to make sure that FBC is
reenabled automatically, then we need to think some more. In a normal
workload, this should be the case (since the retire worker you rely on
is for hostile userspace). If you simply look at the hostile userspace
(and you already are for the frontbuffer writes), then a longer timeout
is definitely acceptable, but how long? What is that limit?

If you define an upper bound for how long you allow fbc et al to remain
off, then we will need an explicit timer to match.
-Chris