[PATCH] Per-call profiling support

José Fonseca jose.r.fonseca at gmail.com
Thu Jan 12 11:22:28 PST 2012


On Mon, Jan 9, 2012 at 6:04 AM, Ryan C. Gordon <icculus at icculus.org> wrote:
>
> I added profiling to glretrace, to fix a performance issue in a game I'm
> working on.
>
> The gist is that you make a trace as normal, but you run glretrace with the
> -p option...it'll re-run the GL calls as usual, but it'll note how long each
> call took and write a log. This has the benefit of being pretty precise for
> measuring GL performance per-call, and also removes all the overhead of the
> game, so you're measuring how fast you can push the GPU.
>
> In that game with the performance issue, looking at a slow frame, within
> seconds, I knew where the problem was...
>
> 507216 [1.708 usec] glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 3)
> 507217 [3431.36 usec] glMapBufferARB(target = GL_ARRAY_BUFFER, access =
> GL_READ_WRITE) = 0xc8406400
> 507218 [1.913 usec] glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 3)

It's unrelatted, but this call is a bit weird -- rebinding the same
buffer while it is mapped is unnecessary (and potentially dangerous).

> 507219 [639.253 usec] memcpy(dest = 0xc8406400, src = blob(3120000), n =
> 3120000)
> 507220 [1.994 usec] glUnmapBufferARB(target = GL_ARRAY_BUFFER) = true
>
> ...a buffer mapping/write/unmap that takes 4+ milliseconds?! (and many more
> like this over the frame) ... I swapped that out with a
> GL_ARB_map_buffer_range approach, and no more bottleneck!

Yes, it can be a win on certain hardware, specially if you only modify
a portion of the buffer size.

> Without this, I couldn't find a single tool on Linux that could do decent
> OpenGL profiling, but spending an hour adding this to ApiTrace changed my
> strategy from psychic debugging to immediate diagnosis and correction. Once
> I could profile the problem, it took five minutes to fix, which felt great.

Nice. I'm glad you found it useful.

> I've attached the apitrace patch; this was my first shot at it. Even if the
> idea makes sense to everyone, it probably needs at least some modest
> cleanups, so comments and criticism are appreciated.

Yes, it just needs some massaging, namely:

- instead of adding the new now() function the existing os::getTime()
function should be modified to return nanosecs instead of usecs, and
modified to use clock_gettime(CLOCK_REALTIME,..) on linux

- no need to touch/modify the model (ie., trace::Call object): the
elapse time should be returned in a global variable (eventually all
these ugly global variables will be refactored into a member variables
of a new Retracer object)

For the record, there was another patch to do cpu and gpu profiling,
but also needs more cleanup, in particular the gpu profiling part:

  https://github.com/apitrace/apitrace/commit/5d7aab888f4b4e3bb06cdc19afff8825af3b1e89

Jose


More information about the apitrace mailing list