[Mesa-dev] [PATCH] Add EXT_timer_query to the mesa state tracker and softpipe

Sun May 23 04:45:27 PDT 2010

On Sun, May 23, 2010 at 12:44:54PM +0200, Mathias Fröhlich wrote:
> 
> Hi,
> 
> On Monday 17 May 2010 20:51:09 Corbin Simpson wrote:
> > I'm going to be proactive here, and pull in both this patch and a docs
> > update.
> Ok.
> Now that the infrastructure is there.
> 
> My initial aim was to have something to profile the r300g driver.
> It already  runs nicer then the classic one for plenty of stuff I tried. But 
> that is still up to factors slower than the binary only driver from amd/ati.
> So having a clue where to look for improvements is a good thing to do.
> Also OpenSceneGraph uses this timers for its helpful graphical scene graph 
> profiling aids.
> 
> To do that I have been looking into the docs to find a cycle counter or 
> something equivalent in the gpu. But so far without luck.
> If we have such a counter, we could dump that counter into the query object 
> similar to the occlusion query implementation.
> 
> Sure we can alternatively trigger a soft interrupt and read the kernel timer 
> in the interrupt handler. That would already give nanoseconds timers. But I 
> hope that this kind of functionality could be implemented less intrusive as 
> this requires changes to the kernel part of the driver I think.
> 
> Thoughts?
> Knowledge about some undocumented registers fitting that purpose?
> 
> ... looking at amds windows profiling tools make me believe that there are such 
> registers.
> 
> Greetings
> 
> Mathias

IMHO, i think using profiler such as sysprof can already gives clue
on to where we are slower. I think we are still CPU limited(1) rather
than only GPU limited thus profiling GPU won't give any significant
improvement. Also adding support for hyper-z+fast clear is likely to
give significant improvement (around 20-50% iirc the ATI figures).
Last pageflipping will also improve the fps for anything fullscreen.

(1) I did some microbenchmark few month ago and sending the same
GPU rendering command stream 10000 was around 4 times faster than
rendering the same scene through GL 10000 times (note that what get
to GPU is the same in both case). Of course this is microbenchmark
so result must be taken with care.

Also it's possible that the memory manager is taking bad decision
and waste memory bandwidth. I haven't yet think to a way to benchmark
memory manager (i guess only way is to test different memory manager
scheme).

Anyway my point is that GPU profiling is likely of limited interest
given the missing features.

Cheers,
Jerome