[RFC] [PATCH 1/2] Add a basic support of logging cpu and gpu time

Chris Fester camaronut at gmail.com
Wed Aug 24 09:34:56 PDT 2011


Hi Yuanhan,

Some questions:

What is the resolution of __glBeginQuery(GL_TIME_ELAPSED,...)?  Does that go
out to the card hardware itself, or is it just a wrapper for gettimeofday()
for posix systems?

More inline below:


On Tue, Aug 23, 2011 at 9:08 PM, Yuanhan Liu <yuanhan.liu at intel.com> wrote:

> Restructured to match the latest master branch. The implementation is
> quite ugly and dirty. But it didn't change the structure of the trace file.
>
> <snip!>

> +       print '    if (query_index < MAX_QUERIES) {'
> +       if function.loggputime:
> +           print '        __glGenQueries(1, &gpu_queries[query_index]);'
> +           print '        __glBeginQuery(GL_TIME_ELAPSED,
> gpu_queries[query_index]);'
> +           print '        last_gpu_query = gpu_queries[query_index];'
> +       else:
> +           print '        gpu_queries[query_index] = 0;'
> +       print '        t0 = OS::GetTime();'
> +       print '    }'
>         self.dispatch_function(function)
> +       print '    if (query_index < MAX_QUERIES) {'
> +       print '        t1 = OS::GetTime();'
> +       if function.loggputime:
> +           print '        __glEndQuery(GL_TIME_ELAPSED);'
> +       print '        cpu_time[query_index] = (double)(t1 - t0);'
> +       print '        query_index++;'
> +       print '    }'
>         print '    __writer.beginLeave(__call);'
>         for arg in function.args:
>             if arg.output:
>

For Linux, I've seen the gettimeofday call (what OS::GetTime() does) take a
few tens of microseconds to complete (granted, I'm on an Intel Core2 system,
3 or so years old).  I believe this is mostly due to the overhead of a
system call.

In my local copy of apitrace I'm in the process of developing timing CPU
time with the timestamp counter.  This has the advantage of a finer timing
resolution and no system call overhead.  However the disadvantage is that
some older CPUs TSC tick rate will vary with CPU frequency.  Plus one has to
disable core "sleeping" with newer CPUs.  Certainly not perfect and not for
everyone.  My implementation does modify the trace format and embeds the
timing info with the rest of the trace.

My admittedly lofty goal was to trace an application, capturing the TSC
data.  Then run the retrace, also capturing TSC data.  One could then diff
frames/calls between the trace and retrace to see CPU utliization
differences.  This is mainly to debug a performance problem I'm seeing at
the company I work for.  The problem exists on the 11.x series of ATI
drivers, but not the 10.3 driver.

Lofty goals aside, I can only devote "spare" time to implementing TSC
capture.  I'll share whenever I get this puppy doing the initial trace
capture, but unfortunately it won't be soon.

Thanks!
Chris

-- 
Oh, meltdown... It's one of these annoying buzzwords. We prefer to call it
an unrequested fission surplus.
-- Mr. Burns, The Simpsons
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/apitrace/attachments/20110824/b0fae31b/attachment.htm>


More information about the apitrace mailing list