[Intel-gfx] [PATCH] drm/i915: add a tracepoint for gpu frequency changes

Mon Sep 3 00:53:41 PDT 2012

On Sat, Sep 01, 2012 at 06:36:32PM -0700, Ben Widawsky wrote:
> On 2012-09-01 12:14, Daniel Vetter wrote:
> >On Sat, Sep 1, 2012 at 8:35 PM, Ben Widawsky <ben at bwidawsk.net>
> >wrote:
> >>On 2012-09-01 11:28, Arjan van de Ven wrote:
> >>>
> >>>On 9/1/2012 11:26 AM, Ben Widawsky wrote:
> >>>>
> >>>>On 2012-08-30 04:26, Daniel Vetter wrote:
> >>>>>
> >>>>>We've had and still have too many issues where the gpu
> >>>>>turbot doesn't
> >>>>>quite to what it's supposed to do (or what we want it to do).
> >>>>>
> >>>>>Adding a tracepoint to track when the desired gpu
> >>>>>frequence changes
> >>>>>should help a lot in characterizing and understanding problematic
> >>>>>workloads.
> >>>>>
> >>>>>Also, this should be fairly interesting for power tuning (and
> >>>>>especially noticing when the gpu is stuck in high
> >>>>>frequencies, as has
> >>>>>happened in the past) and hence for integration into powertop and
> >>>>>similar tools.
> >>>>>
> >>>>>Cc: Arjan van de Ven <arjan at linux.intel.com>
> >>>>>Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> >>>>
> >>>>
> >>>>I can't help but think it's equally interesting to know when
> >>>>the queue
> >>>>the work as well.
> >>>>
> >>>>
> >>>
> >>>btw... if the userspace interface (e.g. the actual event) is not
> >>>controversial and very unlikely to change,
> >>>I'd like to start coding the powertop support for this already....
> >>
> >>
> >>I have no problem with Daniel's patch. It's just a matter of
> >>cutting through
> >>some scheduler BS of "when the GPU wants to change frequency"
> >>vs. "when we
> >>actually change the GPU frequency." I think *both* are interesting.
> >
> >Hm, aren't there some neat tracepoints to measure the latency of work
> >items around already? I agree that this might be useful, but just
> >adding a tracepoint for this one workqueue item feels like overkill
> 
> It depends on what you're trying to measure. I think this patch is
> quite useful but I think I'll make you defend your patch now since
> you're the maintainer and you took your own patch and you're
> shooting down my idea. So please tell me what PowerTOP should do
> with this patch other than notice we're stuck (and furthermore, even
> if we're stuck, what should it tell us to do)?

Actually it shouldn't only notice that we're stuck but e.g. also notice
that a blinking cursor keeps us at high gpu clock (which together with the
low rc6 residency should explain the power usage in these scenarios). Or
maybe integrate it into a graphical overview (but to make that useful we
first need to figure out how to add precise tracepoints for
batch_begin/end) so that interactions between gpu/cpu stand out more.

> As you may recall we can get multiple up and down interrupts, and we
> coalesce them and only do one thing. Information is lost there that
> could have been useful; caveat to that - I haven't looked at the
> code in a while, but that's what we used to do. What I mean though
> is, if we get an up then down interrupt, in that order, it will go
> out as a trace event that we're raising the GPU frequency (again,
> previous caveat applies). So I think this event + the current GPU
> frequency is more interesting than just when we change the
> frequency; however all 3 would be even better for finding driver
> bugs.

The added tracepoints gives us an event when we actually change the hw
state - which is imo the important thing to measure for performance tuning
and diagnosing issues. Figuring out _why_ things are amiss is then the
usual gfx driver debug madness, and I think adding a bunch of tracepoints
specifically just for that feels like overwill.

> More on tracing at the interrupt time: I think getting this info to
> userspace is somewhat less useful than tying it into some kind of
> CPU governor hooks. For example, if we get multiple consecutive RP
> down interrupts, we can probably add it to a list of reasons we
> might want to lower the CPU frequency, and the contrapositive is
> also true.

I didn't add the tracepoint at irq time, but only where we change the gpu
clock. And the tracepoint dumps the new gpu clock freq we've just set, so
no way to get out of sync with down/up irqs, either.
-Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48