[Mesa-dev] [RFC 0/6] i965: INTEL_performance_query re-work

Robert Bragg robert at sixbynine.org
Tue May 12 10:32:17 PDT 2015


On Wed, May 6, 2015 at 9:36 AM, Samuel Pitoiset
<samuel.pitoiset at gmail.com> wrote:
>
>
> On 05/06/2015 02:53 AM, Robert Bragg wrote:
>>
>> As we've learned more about the observability capabilities of Gen
>> graphics we've found that it's not enough to only try and configure the
>> OA unit from userspace without any dedicated support from the kernel.
>
>
> Hi Robert,
>
> Yeah, this is the same idea for performance counters on Nouveau.
>
> We also need to implement a dedicated support from the kernel for
> configuring/sampling hardware performance counters. Then, we can
> expose a list of available counters through a set of ioctls. Thus, mesa
> configures a hardware event by sending its "configuration" to the kernel.

Ok makes sense. Just for reference, if you look at my current i915_oa
kernel driver you'll see that our configurations are statically
declared in the driver itself (well there's currently only one "3D"
configuration to start with) but certainly a topic that keeps coming
up is the idea of also exposing an interface for userspace to send the
kernel new counter configurations. There are a number of extra details
to consider if we want to enable that so I decided to defer that for
now.

>
>>
>> As it is currently the i965 backends for both AMD_performance_monitor
>> and INTEL_performance_query aren't able to report normalized metrics
>> useful to application developers due to the limitations of configuring
>> the OA unit from userspace via LRIs.
>>
>> More recently we've developed a perf PMU (performance monitoring unit)
>> driver within the drm i915 driver ("i915_oa") that lets userspace
>> configure and open an event fd via the perf_event_open syscall which
>> provides us a more complete interface for configuring the Gen graphics
>> OA unit.
>>
>> With help from the kernel we can support periodic sampling (where the
>> hardware writes reports into a gpu mapped circular buffer that we can
>> forward as perf samples), we can deal with the clock gating + PM
>> limitations imposed by the observability hw and also manage + maintain
>> the selection of performance counters.
>>
>> The perf_event_open(2) man page is a good starting point for anyone
>> wanting to learn about the Linux perf interface. Something to beware of
>> is that there's currently no precedent upstream for exposing device
>> metrics via a perf PMU and although early feedback was sought for this
>> work, some of this may be subject to change based on feedback from the
>> core perf maintainers as well as the i915 drm driver maintainers.
>
>
> Performance counters on Nouveau won't be exposed (in the near future)
> by perf since they need to be tied to the command stream of the GPU,
> and perf only works with ioctl calls.

Yeah the reports we currently collect from perf are periodically
sampled by the hardware, not tied to the command stream.

For metrics synchronized with the CS we have an MI_REPORT_PERF_COUNT
command which generates a report following the same configuration and
report format as for periodic sampling, so even when we're only
interested in these MI_RPC based reports it's convenient to have a
common kernel interface for configuring the OA unit.

A few related things here:

Even when we're primarily interested in MI_RPC based reports around
work of interest we will combine that with periodic reports too due to
the possibility of our 32bit counters (on Haswell at least)
overflowing relatively quickly for some hw generations. Since we know
theoretically, the fasted that any counter can overflow for a
particular generation we can collect periodic samples, just below that
frequency.

We're experimenting with a model where the kernel is automatically
inserting MI_RPC commands around batches and forwarding those reports
via perf, in-line with periodic reports. This lets us track work
per-context with more flexibility than the HW single-context filtering
which is useful for workloads that are spread between multiple
contexts of interest. (I see lots of interest in this from teams
looking to profile media workloads). To also help track different
stages of work within a single context, in this case userspace has a
way of tagging work it's submitting and those tags can be matched via
the perf samples too.

For tooling it's very helpful to be able to collect periodic and CS
synchronized reports, potentially across multiple contexts, via one
event fd, as opposed to collecting per-context results in userspace
possibly across several drivers (e.g media + GL + CL) and then needing
to devise a scheme for forwarding all that data to tools in userspace.

>
>>
>> This PRM is a good starting point for anyone wanting to learn about the
>> Gen graphics Observability hardware. Some important information is
>> currently missing and this should be updated soon, but that's more
>> directly related to the i915_oa perf driver. Notably though the report
>> formats described here need to be understood by Mesa, since the perf
>> samples simply forward the raw reports from the OA hardware.
>>
>> https://01.org/sites/default/files/documentation/
>> observability_performance_counters_haswell.pdf
>>
>> This series re-works the i965 driver's support for exposing performance
>> counters, taking advantage of this i915_oa perf event interface.
>>
>> A corresponding kernel branch with an initial i915_oa driver for Haswell
>> can be found here:
>>
>> https://github.com/rib/linux  wip/rib/oa-hsw-4.0.0
>>
>> A corresponding libdrm branch can be found here:
>>
>> https://github.com/rib/drm  wip/rib/oa-hsw-4.0.0
>>
>> In case it's helpful to see another example using the i915_oa perf
>> interface I've also been developing a 'gputop' tool that both lets me
>> test the INTEL_performance_query interface to collect per-context
>> metrics from Mesa and can also visualize system wide metrics (i.e.
>> across all gpu contexts) using perf directly:
>>
>> https://github.com/rib/gputop
>
>
> This is pretty good for testing OA counters without mesa.

I've been told that this isn't building cleanly for others a.t.m so
I'll try check for some silly build script issues to hopefully make
sure if anyone else wants to try this out, it's straight forward. I
think one issue is that it depends on the kernel headers from my
oa-hsw-4.0.0 branch and my solution to that is to add -isystem
/path/to/drm-intel/include to CFLAGS when ./configure, but it could be
more convenient to have a local copy of the required drm header in the
gputop repo.

>
>>
>> Although I haven't updated the branches in a while, I could share some
>> initial code adding support for Broadwell if anyone's interested to get
>> a sense of what's involved in supporting later hardware generations.
>>
>> I still anticipate some (hopefully relatively minor) tweaking of
>> implementation details based on review feedback for the i915_oa driver,
>> but I hope that this is a good point to ask for some feedback on the
>> Mesa changes.
>>
>> If it's more convenient, these patches can also be fetched from here:
>>
>> https://github.com/rib/mesa  wip/rib/oa-hsw-4.0.0
>
>
> Great work Robert. :-)
>
> I'll try to give you my feedback in the next few days.

That would be great if you get a chance; thanks a lot!

- Robert

>
>
>>
>> Regards,
>> - Robert
>>
>> Robert Bragg (6):
>>    i965: Remove perf monitor/query backend
>>    Separate INTEL_performance_query frontend
>>    Model INTEL perf query backend after query object BE
>>    i965: Implement INTEL_performance_query extension
>>    i965: Expose OA counters via INTEL_performance_query
>>    i965: Adds further support for "3D" OA counters
>>
>>   src/mapi/glapi/gen/gl_genexec.py                   |    1 +
>>   src/mesa/Makefile.sources                          |    2 +
>>   src/mesa/drivers/dri/i965/Makefile.sources         |    2 +-
>>   src/mesa/drivers/dri/i965/brw_context.c            |    5 +-
>>   src/mesa/drivers/dri/i965/brw_context.h            |  101 +-
>>   .../drivers/dri/i965/brw_performance_monitor.c     | 1472 ------------
>>   src/mesa/drivers/dri/i965/brw_performance_query.c  | 2356
>> ++++++++++++++++++++
>>   src/mesa/drivers/dri/i965/intel_batchbuffer.c      |   10 +-
>>   src/mesa/drivers/dri/i965/intel_extensions.c       |   69 +-
>>   src/mesa/main/context.c                            |    3 +
>>   src/mesa/main/dd.h                                 |   39 +
>>   src/mesa/main/mtypes.h                             |   28 +
>>   src/mesa/main/performance_monitor.c                |  579 -----
>>   src/mesa/main/performance_monitor.h                |   39 -
>>   src/mesa/main/performance_query.c                  |  608 +++++
>>   src/mesa/main/performance_query.h                  |   80 +
>>   16 files changed, 3197 insertions(+), 2197 deletions(-)
>>   delete mode 100644 src/mesa/drivers/dri/i965/brw_performance_monitor.c
>>   create mode 100644 src/mesa/drivers/dri/i965/brw_performance_query.c
>>   create mode 100644 src/mesa/main/performance_query.c
>>   create mode 100644 src/mesa/main/performance_query.h
>>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list