Hi Yuanhan,<br><br>Some questions:<br><br>What is the resolution of __glBeginQuery(GL_TIME_ELAPSED,...)? Does that go out to the card hardware itself, or is it just a wrapper for gettimeofday() for posix systems?<br><br>More inline below:<br>
<br><br><div class="gmail_quote">On Tue, Aug 23, 2011 at 9:08 PM, Yuanhan Liu <span dir="ltr"><<a href="mailto:yuanhan.liu@intel.com">yuanhan.liu@intel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Restructured to match the latest master branch. The implementation is<br>
quite ugly and dirty. But it didn't change the structure of the trace file.<br>
<br></blockquote><div><snip!> <br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
+ print ' if (query_index < MAX_QUERIES) {'<br>
+ if function.loggputime:<br>
+ print ' __glGenQueries(1, &gpu_queries[query_index]);'<br>
+ print ' __glBeginQuery(GL_TIME_ELAPSED, gpu_queries[query_index]);'<br>
+ print ' last_gpu_query = gpu_queries[query_index];'<br>
+ else:<br>
+ print ' gpu_queries[query_index] = 0;'<br>
+ print ' t0 = OS::GetTime();'<br>
+ print ' }'<br>
self.dispatch_function(function)<br>
+ print ' if (query_index < MAX_QUERIES) {'<br>
+ print ' t1 = OS::GetTime();'<br>
+ if function.loggputime:<br>
+ print ' __glEndQuery(GL_TIME_ELAPSED);'<br>
+ print ' cpu_time[query_index] = (double)(t1 - t0);'<br>
+ print ' query_index++;'<br>
+ print ' }'<br>
print ' __writer.beginLeave(__call);'<br>
for arg in function.args:<br>
if arg.output:<br clear="all"></blockquote></div><br>For Linux, I've seen the gettimeofday call (what OS::GetTime() does) take a few tens of microseconds to complete (granted, I'm on an Intel Core2 system, 3 or so years old). I believe this is mostly due to the overhead of a system call. <br>
<br>In my local copy of apitrace I'm in the process of developing timing CPU time with the timestamp counter. This has the advantage of a finer timing resolution and no system call overhead. However the disadvantage is that some older CPUs TSC tick rate will vary with CPU frequency. Plus one has to disable core "sleeping" with newer CPUs. Certainly not perfect and not for everyone. My implementation does modify the trace format and embeds the timing info with the rest of the trace.<br>
<br>My admittedly lofty goal was to trace an application, capturing the TSC data. Then run the retrace, also capturing TSC data. One could then diff frames/calls between the trace and retrace to see CPU utliization differences. This is mainly to debug a performance problem I'm seeing at the company I work for. The problem exists on the 11.x series of ATI drivers, but not the 10.3 driver.<br>
<br>Lofty goals aside, I can only devote "spare" time to implementing TSC capture. I'll share whenever I get this puppy doing the initial trace capture, but unfortunately it won't be soon.<br><br>Thanks!<br>
Chris<br><br>-- <br>Oh, meltdown... It's one of these annoying buzzwords. We prefer to call it an unrequested fission surplus.<br>-- Mr. Burns, The Simpsons<br>