[Intel-gfx] INSTDONE instrumentation (patch in progress)
Peter Clifton
pcjc2 at cam.ac.uk
Sun Oct 31 16:04:07 CET 2010
On Sun, 2010-10-31 at 08:10 +0000, Chris Wilson wrote:
> On Sun, 31 Oct 2010 02:24:06 +0000, Peter Clifton <pcjc2 at cam.ac.uk> wrote:
> > Hi guys,
> >
> > I thought I'd attach this, as it is now gone 2AM and I doubt I'm going
> > to finish it "tonight". I was hoping to elicit some initial review to
> > suggest whether the design was sane or not.
>
> Been here, done something similar and ripped it out.
Doh.. and I was feeling pleased with myself there ;) It took me nearly a
day, but it was the first serious kernel driver work I've done... I had
to look up a lot of APIs!
FWIW, the patch (or the version I have here which uses double-buffering
for the samples), does produce vaguely useful looking data, but I'm not
sure I can get the hrtimer to fire fast enough. Spinning a separate
scheduled process would be one (nasty) way to go.
> What we want to do is
> integrate an additional set of sampling with perf. The last time I looked,
> it required a few extra lines to perf-core to allow devices to register
> their own counters, and then you get the advantage of a reasonable
> interface (plus the integration with CPU profiling and timecharts etc).
Sounds good. I had wondered about integration with tracing (hence the
name of my code), so it could "somehow" be tied in with something like
sysprof output.
I'm as yet undecided whether to attempt to stream events / data fast
enough to capture live 0/1 data on whether a unit is busy or not, or
whether we want to take a burst of samples periodically, and come up
with a %busy figure within the kernel at each time step, e.g.
| | | | | | | | | | | | | | | | | <-- sampling raw data
0 1 1 0 0 1 0 1 0 1 1 1 0 0 0 1 1
OR
||||| ||||| ||||| ||||| <-- sampling bursts
20% 50% 45% 20%
Somehow the latter seems like it might help IO down to the userspace
app, but it does artificially blur the lines between units, and stops
you seeing exactly when units synchronise being busy / idle etc. It
would probably still produce pictures along the lines of the ones I
posted to the list, which I think looked useful.
[snip]
> You can use the current trace points to get timings for
> submit + complete + retire. What's missing here is being able to mark
> individual batch buffers for profiling. I think adding a new TIMING_FENCE
> ioctl (it could just be a fence ;-) that capture various stats at the
> point of submission and completion and then fired off an event (to be read
> on the /dev/dri/card0 fd) would be the more flexible solution.
Stupid question.. what do you mean by "fence". I vaguely understand the
term for URB allocation boundaries, for tiling boundaries (I think)..
Do you mean noting down some ring buffer sequence ids which we can pick
up when they start / complete to enable tracing?
Could you just add a flag to the exec batchbuffer IOCTL to enable
tracing, or do you want to pass more information to control how things
are traced?
On the one hand, it would be really fun to see how individual
batchbuffers utilise the GPU, but in reality, there are multiple
batchbuffers for a given rendering frame in some workloads.
I assume it is completely possible for some other client to slip in a
batchbuffer in between my app's batch-buffers, so you'd really want to
see that (and see who it belongs to) in order to explain the resulting
timings.
At present, I'm uncertain whether the perf read-out needs to be from
within the app we're trying to profile (so we know which batchbuffers
belong to it), or whether it works best as some external application.
With the latter we can either choose to look at system wide GPU usage,
but if we wanted to target down onto a frame of rendering from a
particular application - we'd need some means to identify the
appropriate batches / times.
I'd love to see a real time (not post-processed) graph of frame timing
for the GPU, but it amused me to realise the app would have to be pretty
self-aware of its own rendering usage to avoid profiling its own graph
drawing routines.
--
Peter Clifton
Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA
Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)
More information about the Intel-gfx
mailing list