[RFC] [PATCH 1/2] Add a basic support of logging cpu and gpu time
Zack Rusin
zack at kde.org
Wed Aug 24 15:15:58 PDT 2011
On Wednesday, August 24, 2011 03:28:00 PM Chris Fester wrote:
> On Wed, Aug 24, 2011 at 12:00 PM, Zack Rusin <zack at kde.org> wrote:
> > Long term is would probably make sense to just use:
> >
> > http://www.fftw.org/cycle.h
>
> Yup, sounds good.
>
> > but my main concern for CPU timings is that compression and disk io will
> > skew
> > them a bit.
>
> Yes, that is possible. I was thinking of reading the timestamp immediately
> before the call, and then immediately after.
But then you're measuring CPU time of an asynchronous call to the GPU which
probably won't tell you a lot. It possibly tells you how quick the driver is
at queuing the data which isn't likely to be very strongly related to the
actual performance. In other words I'm not sure if this particular timing is
useful. For visualization purposes CPU time between swapbuffers and CPU
utilization between are going to tell you a lot more than CPU times for each
driver call.
> > example what I'd love to know:
> >
> > - what exactly it is that we're trying to measure,
>
> In my case, trying to measure call times to see if there's gross
> differences between driver versions. I'll admit it's kinda specialized,
> and most folks may not find it useful.
I think the information is useful, but as mentioned above I don't think it's
what you'd be measuring by checking the CPU time.
I think for your particular use case measuring actual GPU times (e.g. using
the timer_query extension) during retracing would be a lot more meaningful. In
fact I think it'd be exactly what you're looking for.
I think that computing timings during tracing could be useful if your
application is CPU bound - huge discrepency between tracing and retracing
timings would mean that the app is spending too much time creating the data.
And the biggest offsets would point you to the places in your app that are the
bottlenecks.
Other than this case I'm not sure if computing the timings while tracing is
particular useful, because for all other case it should be a lot better to do
it while retracing.
> - what's the performance of tracing while fetching both cpu and gpu time
> for
>
> > every call,
>
> The RDTSC instruction itself takes a clock cycle or two of cpu time. In
> total, after transferring the registers to memory, the latency is on the
> order of nanoseconds. 'Course, there's still definite drawbacks to RDTSC,
> especially WRT dynamic frequency scaling and sleep states.
That sounds pretty good.
> Has there been discussion on pushing the writer out to a dedicated thread?
Yea, I've already done it in the threaded-trace branch. Snappy compression
results in a lot, lot bigger performance improvement so the compression branch
should be integrated first.
> Or is there too much of a risk that a tracing thread would outrun the
> writer?
Well, that's often the case. I use 32mb ringbuffer in the branch and if the
ringbuffer is full I use mutexes and conditional variables to stall the
rendering thread while the disk writes the contents of the ringbuffer out.
> Experiments are fun. :)
Yea :) I think in particular for this it's going to take a few iterations to
make it useful. Especially figuring out what we need to correctly and nicely
visualize that data.
z
More information about the apitrace
mailing list