[PATCH 5/9] drm/i915: Enable i915 perf stream for Haswell OA unit

Tue May 3 20:03:45 UTC 2016

On Tue, May 3, 2016 at 8:34 PM, Robert Bragg <robert at sixbynine.org> wrote:

> Sorry for the delay replying to this, I missed it.
>
> On Sat, Apr 23, 2016 at 11:34 AM, Martin Peres <martin.peres at free.fr>
> wrote:
>
>> On 20/04/16 17:23, Robert Bragg wrote:
>>
>>> Gen graphics hardware can be set up to periodically write snapshots of
>>> performance counters into a circular buffer via its Observation
>>> Architecture and this patch exposes that capability to userspace via the
>>> i915 perf interface.
>>>
>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>> Signed-off-by: Robert Bragg <robert at sixbynine.org>
>>> Signed-off-by: Zhenyu Wang <zhenyuw at linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.h         |  56 +-
>>>   drivers/gpu/drm/i915/i915_gem_context.c |  24 +-
>>>   drivers/gpu/drm/i915/i915_perf.c        | 940
>>> +++++++++++++++++++++++++++++++-
>>>   drivers/gpu/drm/i915/i915_reg.h         | 338 ++++++++++++
>>>   include/uapi/drm/i915_drm.h             |  70 ++-
>>>   5 files changed, 1408 insertions(+), 20 deletions(-)
>>>
>>> +
>>> +
>>> +       /* It takes a fairly long time for a new MUX configuration to
>>> +        * be be applied after these register writes. This delay
>>> +        * duration was derived empirically based on the render_basic
>>> +        * config but hopefully it covers the maximum configuration
>>> +        * latency...
>>> +        */
>>> +       mdelay(100);
>>>
>>
>> With such a HW and SW design, how can we ever expose hope to get any
>> kind of performance when we are trying to monitor different metrics on
>> each
>> draw call? This may be acceptable for system monitoring, but it is
>> problematic
>> for the GL extensions :s
>>
>
>> Since it seems like we are going for a perf API, it means that for every
>> change
>> of metrics, we need to flush the commands, wait for the GPU to be done,
>> then
>> program the new set of metrics via an IOCTL, wait 100 ms, and then we may
>> resume rendering ... until the next change. We are talking about a
>> latency of
>> 6-7 frames at 60 Hz here... this is non-negligeable...
>>
>
>> I understand that we have a ton of counters and we may hide latency by not
>> allowing using more than half of the counters for every draw call or
>> frame, but
>> even then, this 100ms delay is killing this approach altogether.
>>
>
>
>
So revisiting this to double check how things fail with my latest
driver/tests without the delay, I apparently can't reproduce test failures
without the delay any more...

I think the explanation is that since first adding the delay to the driver
I also made the the driver a bit more careful to not forward spurious
reports that look invalid due to a zeroed report id field, and that
mechanism keeps the unit tests happy, even though there are still some
number of invalid reports generated if we don't wait.

One problem with simply having no delay is that the driver prints an error
if it sees an invalid reports so I get a lot of 'Skipping spurious, invalid
OA report' dmesg spam. Also this was intended more as a last resort
mechanism, and I wouldn't feel too happy about squashing the error message
and potentially sweeping other error cases under the carpet.

Experimenting to see if the delay can at least be reduced, I brought the
delay up in millisecond increments and found that although I still see a
lot of spurious reports only waiting 1 or 5 milliseconds, at 10
milliseconds its reduced quite a bit and at 15 milliseconds I don't seem to
have any errors.

15 milliseconds is still a long time, but at least not as long as 100.

Regards,
- Robert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20160503/0071cad5/attachment.html>