[Intel-gfx] Design of a GPU profiling debug interface
jbarnes at virtuousgeek.org
Tue Nov 9 09:15:54 PST 2010
On Sat, 30 Oct 2010 14:04:11 +0100
Peter Clifton <pcjc2 at cam.ac.uk> wrote:
> I think I'll need some help with this. I'm by no means a kernel
> programmer, so I'm feeling my way in the dark with this.
> I want to design an interface so I can synchronise my GPU idle flags
> polling with batchbuffer execution. I'm imagining at a high level, doing
> something like this in my application (or mesa). (Hand-wavey-pseudocode)
> expose_event_handler ()
> static bool one_shot_trace = true;
> if (one_shot_trace)
> mesa_debug_i915_trace_idle (TRUE);
> /* RENDERING COMMANDS IN HERE */
> if (one_shot_trace)
> mesa_debug_i915_trace_idle (FALSE);
> one_shot_trace = false;
> I was imagining adding a flag to the EXECBUFFER2 IOCTL, or perhaps
> adding a new EXECBUFFER3 IOCTL (which I'm playing with locally now).
> Basically I just want to flag execbuffers which I'm interested in seeing
> profiling data for.
> In order to get really high-resolution profiling, it would be
> advantageous to confine it to the time-period of interest otherwise the
> data rate is too high. I guestimated about 10MB/s for a binary
> representation of the data I'm currently polling in user-space. More
> spatial resolution would be nice too, so this could increase.
Would be very cool to be able to correlate the data...
> I think I have a vague idea how to do the GPU and logging parts, even if
> I end up having to start the polling before the batchbuffer starts
> What I've got little / no clue how to is manage allocation of memory to
> store the results in.
> Should userspace (mesa?) be passing buffers for the kernel to return
> profiling data? Then retrieving it somehow when it "knows" the
> batchbuffer is finished? This will probably require over-allocation with
> a guestimate of required memory space to log the given batch-buffer.
> What about exporting via debugfs. Assuming the above code-fragment, we
> could leave the last "frame" of polled data available, with the data
> being overwritten when the next request to start logging comes in.
> (That would perhaps require some kind of sequence number assigned if we
> have multiple batches which come under the same request... or a separate
> IOCTL to turn on / off logging).
There's also relayfs, which is made for high bandwidth kernel->user
communication. I'm not sure if it will make this any easier, but I
think there's some documentation in the kernel about it.
A ring buffer with the last N timestamps might also be a good way of
exposing things. Having more than one entry available means that if
userspace didn't get scheduled at the right time it would still have a
good chance of getting all the data it missed since the last read.
> Also.. I'm not sure how the locking would work if userspace is reading
> out the debugfs file whilst another frame is being executed. (We'd
> probably need a secondary logging buffer allocating in that case).
The kernel implementation of the read() side of the file could do some
locking to prevent new data from corrupting a read in progress.
Jesse Barnes, Intel Open Source Technology Center
More information about the Intel-gfx