[RFC] [PATCH 1/2] Add a basic support of logging cpu and gpu time

Chris Fester camaronut at gmail.com
Wed Aug 24 12:28:00 PDT 2011


On Wed, Aug 24, 2011 at 12:00 PM, Zack Rusin <zack at kde.org> wrote:

> Long term is would probably make sense to just use:
>
> http://www.fftw.org/cycle.h
>
Yup, sounds good.


> but my main concern for CPU timings is that compression and disk io will
> skew
> them a bit.

Yes, that is possible.  I was thinking of reading the timestamp immediately
before the call, and then immediately after.  Something like:

diff --git a/trace_writer.cpp b/trace_writer.cpp
index af8d62f..56aa4fb 100644
--- a/trace_writer.cpp
+++ b/trace_writer.cpp
@@ -33,6 +33,7 @@
 #include <zlib.h>

 #include "os.hpp"
+#include "tsc.h"
 #include "trace_writer.hpp"
 #include "trace_format.hpp"

@@ -206,12 +207,19 @@ void Writer::endEnter(void) {
     _writeByte(Trace::CALL_END);
     gzflush(g_gzFile, Z_SYNC_FLUSH);
     OS::ReleaseMutex();
+    //collect call begin timing here (TLS protects us, don't need mutex)
+    startTimestamp = getTSC();
 }

 void Writer::beginLeave(unsigned call) {
+    //collect call end timing here
+    uint64_t endTimestamp = getTSC();
     OS::AcquireMutex();
     _writeByte(Trace::EVENT_LEAVE);
     _writeUInt(call);
+    _writeByte(Trace::CALL_TIMING);
+    _writeUInt(startTimestamp);
+    _writeUInt(endTimestamp);
 }

 void Writer::endLeave(void) {

I was hoping to use thread local storage for startTimestamp.  Unfortunately
the application I'm tracing was not compiled with -std=gnu++0x, and the
runtime linker complains that it can't find the startTimestamp symbol.  So I
may have to move the timestamp reads into the python generator code and
create a Writer::WriteTiming(a,b), or something close to that, so that the
startTimestamp variable is on the stack.  That's probably more
multi-platform friendly anyway.

By doing the collection like above, you're effectively only timing the call,
not the compression.  Granted, the scheduler may still penalize you because
you're eating up CPU cycles for the compression and IO.  And your overall
frame time will be larger (as you mention below).

So I think we need to figure out the use case for this stuff. For
> example what I'd love to know:
>
> - what exactly it is that we're trying to measure,
>

In my case, trying to measure call times to see if there's gross differences
between driver versions.  I'll admit it's kinda specialized, and most folks
may not find it useful.  I'll take the opportunity to thank you guys for
getting me this far tho.  :)

- what's the performance of tracing while fetching both cpu and gpu time for
> every call,
>

The RDTSC instruction itself takes a clock cycle or two of cpu time.  In
total, after transferring the registers to memory, the latency is on the
order of nanoseconds. 'Course, there's still definite drawbacks to RDTSC,
especially WRT dynamic frequency scaling and sleep states.

- how should the data be stored so that the tools can do the most out of
> them,
> for example as the TODO.markdown mentioned we'd really love to have a
> "Timeline view"
> ( http://msdn.microsoft.com/en-us/library/ee417180(v=vs.85).aspx )
> I haven't yet looked at how the data would need to be stored for this to
> happen.
>
> > My admittedly lofty goal was to trace an application, capturing the TSC
> > data.  Then run the retrace, also capturing TSC data.  One could then
> diff
> > frames/calls between the trace and retrace to see CPU utliization
> > differences.
>
> That sounds like a great idea. The issue is that between the app actually
> generating the data and tracing compressing the data (compression is slower
> than decompression), the tracing timings will always be slower than the
> retracing. Also the disk io read/write speed will heavily influence both.
>

Has there been discussion on pushing the writer out to a dedicated thread?
Or is there too much of a risk that a tracing thread would outrun the
writer?

So I think there's a lot of open questions. Most likely initially we need to
> push some implementation of this stuff to a branch and start experimenting
> with
> it.
>
> z
>

Experiments are fun.  :)
Chris


-- 
Oh, meltdown... It's one of these annoying buzzwords. We prefer to call it
an unrequested fission surplus.
-- Mr. Burns, The Simpsons
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/apitrace/attachments/20110824/465bf7dd/attachment.html>


More information about the apitrace mailing list