[Intel-gfx] [PATCH i-g-t 2/2] tests/perf_pmu: Add tests for engine queued stat

Wed Nov 22 13:51:59 UTC 2017

Quoting Tvrtko Ursulin (2017-11-22 13:42:04)
> 
> On 22/11/2017 12:56, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2017-11-22 12:47:05)
> >> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>
> >> Simple test to check correct queue-depth is reported per engine.
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >> ---
> >>   tests/perf_pmu.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>   1 file changed, 79 insertions(+)
> >>
> >> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> >> index 8585ed7bcee8..17f0afca6fe1 100644
> >> --- a/tests/perf_pmu.c
> >> +++ b/tests/perf_pmu.c
> >> @@ -87,6 +87,17 @@ static uint64_t pmu_read_single(int fd)
> >>          return data[0];
> >>   }
> >>   
> >> +static uint64_t pmu_sample_single(int fd, uint64_t *val)
> >> +{
> >> +       uint64_t data[2];
> >> +
> >> +       igt_assert_eq(read(fd, data, sizeof(data)), sizeof(data));
> >> +
> >> +       *val = data[0];
> >> +
> >> +       return data[1];
> >> +}
> >> +
> >>   static void pmu_read_multi(int fd, unsigned int num, uint64_t *val)
> >>   {
> >>          uint64_t buf[2 + num];
> >> @@ -655,6 +666,65 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
> >>          assert_within_epsilon(val[1], slept, tolerance);
> >>   }
> >>   
> >> +static double calc_queued(uint64_t d_val, uint64_t d_ns)
> >> +{
> >> +       return (double)d_val * 1e9 * I915_SAMPLE_QUEUED_SCALE / d_ns;
> >> +}
> >> +
> >> +static void
> >> +queued(int gem_fd, const struct intel_execution_engine2 *e)
> >> +{
> >> +       const unsigned long duration_ns = 500e6;
> > 
> > 0.5s.
> 
> Not sure what you mean? Express it in a different way using some 
> NSECS_PER_SEC define?

I made a note for myself. Adding /* 0.5s */ would save me commenting out
loud :)

> 
> >> +       igt_spin_t *spin[2];
> >> +       uint64_t val[2];
> >> +       uint64_t ts[2];
> >> +       int fd;
> >> +
> >> +       fd = open_pmu(I915_PMU_ENGINE_QUEUED(e->class, e->instance));
> >> +
> >> +       /*
> >> +        * First check on an idle engine.
> >> +        */
> >> +       ts[0] = pmu_sample_single(fd, &val[0]);
> >> +       usleep(duration_ns / 3000);
> >> +       ts[1] = pmu_sample_single(fd, &val[1]);
> >> +       assert_within_epsilon(calc_queued(val[1] - val[0], ts[1] - ts[0]),
> >> +                             0.0, tolerance);
> >> +
> >> +       /*
> >> +        * First spin batch will be immediately executing.
> >> +        */
> >> +       spin[0] = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
> >> +       igt_spin_batch_set_timeout(spin[0], duration_ns);
> >> +
> >> +       ts[0] = pmu_sample_single(fd, &val[0]);
> >> +       usleep(duration_ns / 3000);
> >> +       ts[1] = pmu_sample_single(fd, &val[1]);
> >> +       assert_within_epsilon(calc_queued(val[1] - val[0], ts[1] - ts[0]),
> >> +                             1.0, tolerance);
> >> +
> > 
> > What I would like here is a for(n=1; n < 10; n++)
> > where max_n is chosen so that we terminate within 5s, changing sample
> > intervals to match if we want to increase N.
> > 
> > Hmm.
> > 
> > for (n = 1; n < 10; n++)
> >       ctx = gem_context_create()
> >       for (m = 0; m < n; m++)
> >               ...etc...
> > 
> > (We probably either want to measure ring_size and avoid that, or use a
> > timeout that interrupts the last execbuf... Ok, that's better overall.)
> > 
> > And have qd geometrically increase. Basically just want to avoid hitting
> > magic numbers inside HW, ELSP/guc depth of 2 being the first magic
> > number we want to miss.
> 
> I get the suggestion to test different queue depths and thats a good 
> one. I did fail to keep track with the rest you wrote including why to 
> add contexts into the picture?

Throwing contexts into the picture was to be sure that it was counting
across contexts. (I was thinking about the complexity with the timelines
and the counter being on the engine, which I started worrying about.)
Next up would be adding waiting requests and demonstrating that they
aren't counted as queued.

> How about simply grow the queue-depth exponentially until a set limit? 
> with a 5s time budget with could go to a quite high qd, much more than 
> we actually need.

I'd just use the time limit. As you say we should be able to grow quite
large within a few seconds, it's just coordinating that with the sample
interval whilst keeping under the 6s to prevent a GPU hang.

> We do have a facility to terminate the spin batch I think so don't have 
> to wait for all of them to complete.

One trick that is quite fun is to keep submitting the same spin batch.
Then you only ever have one to worry about.
-Chris