[Intel-gfx] [PATCH i-g-t 2/2] tests/perf_pmu: Add tests for engine queued stat
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Nov 22 13:42:04 UTC 2017
On 22/11/2017 12:56, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2017-11-22 12:47:05)
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> Simple test to check correct queue-depth is reported per engine.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> ---
>> tests/perf_pmu.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 79 insertions(+)
>>
>> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
>> index 8585ed7bcee8..17f0afca6fe1 100644
>> --- a/tests/perf_pmu.c
>> +++ b/tests/perf_pmu.c
>> @@ -87,6 +87,17 @@ static uint64_t pmu_read_single(int fd)
>> return data[0];
>> }
>>
>> +static uint64_t pmu_sample_single(int fd, uint64_t *val)
>> +{
>> + uint64_t data[2];
>> +
>> + igt_assert_eq(read(fd, data, sizeof(data)), sizeof(data));
>> +
>> + *val = data[0];
>> +
>> + return data[1];
>> +}
>> +
>> static void pmu_read_multi(int fd, unsigned int num, uint64_t *val)
>> {
>> uint64_t buf[2 + num];
>> @@ -655,6 +666,65 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
>> assert_within_epsilon(val[1], slept, tolerance);
>> }
>>
>> +static double calc_queued(uint64_t d_val, uint64_t d_ns)
>> +{
>> + return (double)d_val * 1e9 * I915_SAMPLE_QUEUED_SCALE / d_ns;
>> +}
>> +
>> +static void
>> +queued(int gem_fd, const struct intel_execution_engine2 *e)
>> +{
>> + const unsigned long duration_ns = 500e6;
>
> 0.5s.
Not sure what you mean? Express it in a different way using some
NSECS_PER_SEC define?
>> + igt_spin_t *spin[2];
>> + uint64_t val[2];
>> + uint64_t ts[2];
>> + int fd;
>> +
>> + fd = open_pmu(I915_PMU_ENGINE_QUEUED(e->class, e->instance));
>> +
>> + /*
>> + * First check on an idle engine.
>> + */
>> + ts[0] = pmu_sample_single(fd, &val[0]);
>> + usleep(duration_ns / 3000);
>> + ts[1] = pmu_sample_single(fd, &val[1]);
>> + assert_within_epsilon(calc_queued(val[1] - val[0], ts[1] - ts[0]),
>> + 0.0, tolerance);
>> +
>> + /*
>> + * First spin batch will be immediately executing.
>> + */
>> + spin[0] = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
>> + igt_spin_batch_set_timeout(spin[0], duration_ns);
>> +
>> + ts[0] = pmu_sample_single(fd, &val[0]);
>> + usleep(duration_ns / 3000);
>> + ts[1] = pmu_sample_single(fd, &val[1]);
>> + assert_within_epsilon(calc_queued(val[1] - val[0], ts[1] - ts[0]),
>> + 1.0, tolerance);
>> +
>
> What I would like here is a for(n=1; n < 10; n++)
> where max_n is chosen so that we terminate within 5s, changing sample
> intervals to match if we want to increase N.
>
> Hmm.
>
> for (n = 1; n < 10; n++)
> ctx = gem_context_create()
> for (m = 0; m < n; m++)
> ...etc...
>
> (We probably either want to measure ring_size and avoid that, or use a
> timeout that interrupts the last execbuf... Ok, that's better overall.)
>
> And have qd geometrically increase. Basically just want to avoid hitting
> magic numbers inside HW, ELSP/guc depth of 2 being the first magic
> number we want to miss.
I get the suggestion to test different queue depths and thats a good
one. I did fail to keep track with the rest you wrote including why to
add contexts into the picture?
How about simply grow the queue-depth exponentially until a set limit?
with a 5s time budget with could go to a quite high qd, much more than
we actually need.
We do have a facility to terminate the spin batch I think so don't have
to wait for all of them to complete.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list