[Intel-gfx] [PATCH igt] igt/perf_pmu: Speed up frequency measurement

Tue Dec 19 12:57:54 UTC 2017

On 15/12/2017 21:05, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2017-12-15 18:24:42)
>>
>> On 15/12/2017 17:05, Chris Wilson wrote:
>>> Use the normal batch_duration_ns and display the sampled frequency:
>>>
>>>        Frequency: min=100, max=750, boost=750 MHz
>>>        Min frequency: requested 100.0, actual 100.0
>>>        Max frequency: requested 755.6, actual 755.6
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> ---
>>>    tests/perf_pmu.c | 40 +++++++++++++++++++++++++---------------
>>>    1 file changed, 25 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
>>> index d88287c17..61ae96d9a 100644
>>> --- a/tests/perf_pmu.c
>>> +++ b/tests/perf_pmu.c
>>> @@ -931,9 +931,10 @@ test_interrupts(int gem_fd)
>>>    static void
>>>    test_frequency(int gem_fd)
>>>    {
>>> -     const uint64_t duration_ns = 2e9;
>>
>> I think I remember why it was this long - because in the early days test
>> was actually applying load, not modifying frequencies directly, so had
>> to wait for the frequency to ramp up.
>>
>>>        uint32_t min_freq, max_freq, boost_freq;
>>> -     uint64_t min[2], max[2], start[2];
>>> +     uint64_t val[2], start[2];
>>> +     double min[2], max[2];
>>> +     unsigned long slept;
>>>        igt_spin_t *spin;
>>>        int fd, sysfs;
>>>    
>>> @@ -962,17 +963,19 @@ test_frequency(int gem_fd)
>>>        igt_require(igt_sysfs_set_u32(sysfs, "gt_boost_freq_mhz", min_freq));
>>>        igt_require(igt_sysfs_get_u32(sysfs, "gt_boost_freq_mhz") == min_freq);
>>>    
>>> +     gem_quiescent_gpu(gem_fd); /* Idle to be sure the change takes effect */
>>> +     spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
>>>        pmu_read_multi(fd, 2, start);
>>>    
>>> -     spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
>>> -     igt_spin_batch_set_timeout(spin, duration_ns);
>>> -     gem_sync(gem_fd, spin->handle);
>>> +     slept = measured_usleep(batch_duration_ns / 1000);
>>
>> Wouldn't it be more precise to read the pmu at this point?
> 
> Yes. That makes sense, I was still in busy mode where we wanted it to
> stop being busy simultaneously with the timer ceasing. Thinking maybe
> tieing the two together and include the time for gem_busy to report
> idle. Hopefully the discrepancy is less than a microsecond at this point.
> 
>   
>>> +     igt_spin_batch_end(spin);
>>>    
>>> -     pmu_read_multi(fd, 2, min);
>>> -     min[0] -= start[0];
>>> -     min[1] -= start[1];
>>> +     pmu_read_multi(fd, 2, val);
>>> +     min[0] = 1e9*(val[0] - start[0]) / slept;
>>> +     min[1] = 1e9*(val[1] - start[1]) / slept;
>>>    
>>>        igt_spin_batch_free(gem_fd, spin);
>>> +     gem_quiescent_gpu(gem_fd); /* Don't leak busy bo into the next phase */
>>>    
>>>        usleep(1e6);
>>>    
>>> @@ -987,17 +990,19 @@ test_frequency(int gem_fd)
>>>        igt_require(igt_sysfs_set_u32(sysfs, "gt_min_freq_mhz", max_freq));
>>>        igt_require(igt_sysfs_get_u32(sysfs, "gt_min_freq_mhz") == max_freq);
>>>    
>>> +     gem_quiescent_gpu(gem_fd);
>>> +     spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
>>>        pmu_read_multi(fd, 2, start);
>>>    
>>> -     spin = igt_spin_batch_new(gem_fd, 0, I915_EXEC_RENDER, 0);
>>> -     igt_spin_batch_set_timeout(spin, duration_ns);
>>> -     gem_sync(gem_fd, spin->handle);
>>> +     slept = measured_usleep(batch_duration_ns / 1000);
>>> +     igt_spin_batch_end(spin);
>>>    
>>> -     pmu_read_multi(fd, 2, max);
>>> -     max[0] -= start[0];
>>> -     max[1] -= start[1];
>>> +     pmu_read_multi(fd, 2, val);
>>> +     max[0] = 1e9*(val[0] - start[0]) / slept;
>>> +     max[1] = 1e9*(val[1] - start[1]) / slept;
> 
> I was also thinking that we should
> assert_within_epsilon(max[0], ref, 5), i.e. that the reported average
> frequency is what we expected. That would be better than asserting that
> the actual frequency was less than the requested (who knows what the
> future holds).

Sounds ok.

>>>        igt_spin_batch_free(gem_fd, spin);
>>> +     gem_quiescent_gpu(gem_fd);
>>
>> Ah I see.. only for the spin batch. Why not then gem_sync or maybe we
>> should add igt_spin_batch_free_sync?
> 
> gem_quiescent_gpu goes one step further than gem_sync and says the system is
> idle / parked afterwards. Which is often quite important
> 
> Yes, seems like I'm repeating this pattern often enough that throwing it
> into igt_spin_batch is worthwhile. Also I want to include a spin_batch
> variant that guarantees it has started executing before returning.
> Sadly will require MI_STORE_DWORD so limit it's availability. I think
> I'll wait for the spin_batch options to land before adding more
> parameters.

Not sure what is the status. I'd be OK with either gem_quiescent_gpu at 
the start of subtests, or making igt_spin_batch_free ensure batch 
finished. That should also stop all leaks between tests AFAICT.

Regards,

Tvrtko