[Intel-gfx] [RFC 04/10] drm/i915: Expose a PMU interface for perf queries

Peter Zijlstra peterz at infradead.org
Tue Aug 22 18:17:55 UTC 2017


On Sat, Aug 12, 2017 at 02:15:13AM +0000, Rogozhkin, Dmitry V wrote:
> $ perf stat -e instructions,i915/rcs0-busy/ workload.sh
> <... wrokload.sh output...>
> 
> Performance counter stats for 'workload.sh':
>      1,204,616,268      instructions
>                  0      i915/rcs0-busy/
> 
>        1.869169153 seconds time elapsed
> 
> As you can see instructions event works pretty well, i915/rcs0-busy/
> doesn't.
> 
> I afraid that our current understanding of how PMU should work is not
> fully correct.

Can we start off by explaining to me how this i915 stuff works. Because
all I have is ~750 lines of patch without comments. Which sort of leaves
me confused.

The above command tries to add an event 'i915/rcs0-busy/' to a task. How
are i915 resource associated to any one particular task?

Is there a unique i915 resource for each task? If not, I don't see how
per-task event can ever work as expected.

> I think so, because the way PMU entry points init(),
> add(), del(), start(), stop(), read() are implemented do not correlate
> with how many times they are called. I have counted them and here is the
> result:
> init()=19, add()=44310, del()=43900, start()=44534, stop()=0, read()=0
> 
> Which means that we are regularly attempt to start/stop timer and/or
> busy stats calculations. Another thing which pay attention is that
> read() was not called at all. How perf supposes to get counter value?

Both stop() and del() are supposed to update event->count. Only if we do
sys_read() while the event is active (something perf-stat never does
IIRC) will it issue pmu::read() to get an up-to-date number.

> Yet another thing, where we are supposed to initialize our internal
> staff: numbers above are from single run and even init is called
> multiple times? Where we are supposed to de-init our staff: each time on
> del() - this hardly makes sense?

init happens in pmu::event_init(), that can set an optional
event->destroy() function for de-init.

init() is called once for each event created, the above creates an
inherited per-task event (I think, I lost track of what perf tool does)
and 19 seems to suggest you did some 18 fork()/clone() calls after that,
resulting in your 1 parent event with 18 children.

> I should note that if perf will be issued with -I 10 option, then read()
> is being called: init_c()=265, add_c()=132726, del_c()=131482,
> start_c()=133412, stop()=0, read()=71. However, i915 counter is still 0.
> I have tried to print counter values from within read() and these values
> are non 0. Actually read() returns sequence of <non_zero>, 0, 0, 0, ...,
> <no_zero> because with our add(), del() code we regularly start/stop our
> counter and execution in read() follows different branches.
> 
> Thus, I think that right now we do not implement PMU correctly and do
> not meet perf expectations from the PMU. Unfortunately, right now I have
> no idea what are these expectations.

Please as to clarify how i915 works, I have no idea where to go.


More information about the Intel-gfx mailing list