[Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)
Mathias.Froehlich at gmx.net
Tue Nov 16 23:43:30 PST 2010
On Tuesday, November 16, 2010 20:21:26 Jerome Glisse wrote:
> So i looked a bit more at what path we should try to optimize in the
> mesa/gallium/pipe infrastructure. Here are some number gathers from
> games :
> drawcall / ps constant vs constant ps sampler vs sampler
> doom3 1.45 1.39 9.24 9.86
> nexuiz 6.27 5.98 6.84
> 7.30 openarena 2805.64 1.38 1.51
Just an other observation:
I was doing some profiling on OpenSceneGraph based applications. One of which
the plain osgviewer with numerous models and one of which flightgear.
Drivers and hardware I have is a FireGL 73???,R520,r300g and a
HD4890,RV770,r600g. Testing is done in the same cpu board.
One comparison is the draw time in osgviewer. The ones that know this
application might remember the profiling graph, where you can see how long and
when cull, draw and, if available gpu rendering happens.
I was in this case looking at the draw times which is just the time starting
from the first state change in a frame to the last draw in a frame *excluding*
the buffer swap/sync/flush and whatever serializes program execution tith the
Comparing these osgviewer draw times with fglrx with my favourite test model
(fixed function) that is kind of representative for usage in flightgear.
r300g, git ~1.6ms
The profiling picture of my head is that r300g still spends significant amount
cpu time in current state attribute handling which is too often looping over
all possible state attributes. BTW: that was much worse before Fancescos last
copy to current patches. r300g also spends much time in the draw path in mesa,
where every draw is looping over all 32 state attributes.
Doing some proof of concept work on these code paths improoved the draw times
to 1.2ms on r300g.
The next cpu hog for r300g is the kernel side of the command stream parser. I
would expect that something that makes use of preevaluated and validated
command stream snippets in the kernel that are held for each of the drivers
state objects and are just used in the executed command stream would help much
here. Something along the lines of recording command stream macros/substreams
that are just jumped into when executing the user level command stream. I
believe that Jerome held some talk about something very similar at this years
Translating that performance numbers from an example application to a more
real world one like flightgear brings a framerate of ~85 frames for fglrx and
~60 with current mesa. With the proof of concept stuff I already saw 65-70 on
Now the picture for r600g:
r600g, git 5-7ms
As you can see fglrx is still about the same. But r600g is far off.
Also with r600g I can see the driver spending about as much time in parsing
and validating in kernel as I can see it spending in the r600g backend code.
I do not remember the flightgear framerates for RV770,fglrx, but I believe they
were comparable to the R520 ones, but with r600g I still see just about 20-30
frames. Fiddling with these proof of concept stuff does not show up in r600g in
a noticable way since this one is just dominated by its own backend cpu
So, I cannot contribute to this discussion which ones of the state objects are
more heavily used, but looking at the above I see that r300g is already at a
stage where it makes highly sense to improove some hot paths in mesas top
layer. The r300 userspace backend code is visible but not high in profiles.
But r600g, using the same mesa/gallium infrastructure above spends much cpu
cycles in its userspace as well as in the parser/validator code.
Which makes me wonder what is the fundamental difference of these two backends
that accounts for this difference?
Just my 2cent
More information about the mesa-dev