[Intel-gfx] [RFC 04/10] drm/i915: Expose a PMU interface for perf queries

Wed Aug 23 18:22:14 UTC 2017

On Wed, Aug 23, 2017 at 05:51:38PM +0000, Rogozhkin, Dmitry V wrote:

> Anyhow, returning to the metrics i915 exposes. Some metrics are just
> exposure of some counters supported already inside i915 PMU which do not
> require any special sampling: at any given moment you can request the
> counter value (these are interrupts counts, i915 power consumption).

> Other metrics are similar to the ever-existing which I just described,
> but they require activation for i915 to start to count them - this is
> done on the event initialization (these are engine busy stats).

Right, so depending on how expensive this activation is and if it can be
done without scheduling, there are two options:

 1) activate/deactivate from pmu::start()/pmu::stop()
 2) activate/deactivate from pmu::event_init()/event->destroy() and
    disregard all counting between pmu::stop() and pmu::start().

> Finally, there is a third group which require sampling counting: they
> are needed to be initialized and i915 pmu starts an internal timer to
> count these values (these are some engines characteristics referenced
> in the code as QUEUED, SEMA, WAIT).

So uncore PMUs can't really do sampling. That is, perf defines sampling
as interrupting the relevant task and then providing things like the
%RIP value at interrupt time. Since uncore activity cannot be associated
with any one task, no sampling allowed.

Now, I'm thinking that what i915 does is slightly different, it doesn't
provide registers to read out the counter state, but instead
periodically writes state snapshots into some memory buffer, right?

That's a bit tricky, maybe the best fit would be what PPC HV 24x7 does.
They create an event-group, that is a set of counters that are
co-scheduled, matching the set of counters they get from the HV
interface (or a subset) and then sys_read() will use a TXN_READ to
group-read the entire thing at once. In your case it could consume the
last state snapshot instead of request one (or wait for the next,
whatever works best).

Would that work?