[Intel-gfx] [PATCH i-g-t 2/2] tests/perf_pmu: Exercise busy stats and lite-restore

Wed Jan 10 10:37:51 UTC 2018

On 09/01/2018 21:39, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-01-09 16:16:21)
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> While developing a fix for an accounting hole in busy stats we realized
>> lite-restore is a potential edge case which would be interesting to check
>> is properly handled.
>>
>> It is unfortnately quite timing sensitive to hit lite-restore in the
>> fashion test needs, so downside of this test is that it sufferes from a
>> high rate of false negatives.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> ---
>>   tests/perf_pmu.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 85 insertions(+)
>>
>> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
>> index e1f449d48808..7db5059d202e 100644
>> --- a/tests/perf_pmu.c
>> +++ b/tests/perf_pmu.c
>> @@ -192,6 +192,84 @@ busy_start(int gem_fd, const struct intel_execution_engine2 *e)
>>          gem_quiescent_gpu(gem_fd);
>>   }
>>   
>> +/*
>> + * This test has a potentially low rate of catching the issue it is trying to
>> + * catch. Or in other words, quite high rate of false negative successes. We
>> + * will depend on the CI systems running it a lot to detect issues.
>> + */
>> +static void
>> +busy_double_start(int gem_fd, const struct intel_execution_engine2 *e)
>> +{
>> +       unsigned long slept;
>> +       igt_spin_t *spin[2];
>> +       uint64_t val, val2;
>> +       bool execlists;
>> +       uint32_t ctx;
>> +       int fd;
>> +
>> +       ctx = gem_context_create(gem_fd);
>> +
>> +       if (gem_has_execlists(gem_fd)) {
>> +               /*
>> +                * Defeat the busy stats delayed disable, we need to guarantee
>> +                * we are the first user.
>> +                */
>> +               execlists = true;
>> +               sleep(2);
>> +       } else {
>> +               execlists = false;
>> +       }
>> +
>> +       /*
>> +        * Submit two contexts, with a pause in between targeting the ELSP
>> +        * re-submission in execlists mode. Make sure busyness is correctly
>> +        * reported with the engine busy, and after the engine went idle.
>> +        */
>> +       spin[0] = __igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
>> +       usleep(500000);
>> +       spin[1] = __igt_spin_batch_new(gem_fd, ctx, e2ring(gem_fd, e), 0);
> 
> If you want to avoid the second spinner, you can resubmit the first with
> a second context.

I know you like that but I prefer the pedestrian way. :)

>> +       /*
>> +        * Open PMU as fast as possible after the second spin batch in attempt
>> +        * to be faster than the driver handling lite-restore.
>> +        */
>> +       fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
>> +
>> +       slept = measured_usleep(batch_duration_ns / 1000);
>> +       val = pmu_read_single(fd);
>> +
>> +       igt_spin_batch_end(spin[0]);
>> +       igt_spin_batch_end(spin[1]);
>> +
>> +       gem_sync(gem_fd, spin[0]->handle);
>> +       gem_sync(gem_fd, spin[1]->handle);
>> +
>> +       /* Wait for context complete to avoid accounting residual busyness. */
>> +       if (execlists)
>> +               usleep(100000);
> 
> Then you want a gem_quiescent_gpu() instead of the gem_sync()? Though
> I'm scratching my head here as to what you mean. If there's an issue
> here, won't we see it in our other measurements as well?

Primarily because we want to measure idle after busy here and not in 
other tests, *and*, with execlists time between user interrupt and 
context save counts as busyness. So without this delay it is possible to 
observe a few micro-second of engine busyness in the below check.

>> +       val2 = pmu_read_single(fd);
>> +       usleep(batch_duration_ns / 1000);
>> +       val2 = pmu_read_single(fd) - val2;
> 
> But the engine is idle at this point? Why sleep for execlists and not
> legacy? Isn't the execlists case where we know that just by waiting for
> idle (gem_quiescent_gpu() not an arbitrary usleep) then the pmu counter
> is idle. It's legacy where we would have to wait for an sample interval
> following gpu idle to be sure the pmu is idle.

Hm legacy and sampling.. I think we don't have to wait since there is no 
context save there is the idleness comes straight with the user 
interrupt, or no?

> I think I half understand what you want to assert (an idle gpu after
> being busy, reports idle?) but I'm not entirely sure. :)

Yep!

Regards,

Tvrtko