[Mesa-dev] vertex array regression

Mon Dec 19 23:10:33 PST 2011

Marek,

On Tuesday, December 20, 2011 01:40:46 Marek Olšák wrote:
> The problem back in March and earlier was that the vertex array state
> was completely recomputed for each draw call even if the user did not
> change any state at all! There was the validation in the vbo module
> and then in st/mesa, which were basically needless. It turned out to
> be the major bottleneck in the game Torcs (in the track called Forza).
> I checked the source code of Torcs and it confirmed my findings. There
> was a loop that looked like this:
> 
> for (i = 0; i < very large number; i++) glDrawElements(...);
> 
> From that point, it became clear that Mesa is underperforming and that
> it's fixable. Then I came up with a set of patches which did not
> address the issue completely, but some cases (like with Torcs) were
> optimized (well maybe not cleanly as it caused a few regressions too).
> There still is a lot of cases where the vertex array state is
> recomputed needlessly, so it still is an open issue. I did not fully
> understand how the entire vbo module works and why so much state
> validation was being done there. Even today, drivers have almost no
> way to know in draw_prims whether the vertex array state is _really_
> changed, because core Mesa almost always says "yes, it changed of
> course! why wouldn't it?" Sadly, performance is usually being put
> aside and energy is put into more important stuff like new features. I
> was profiling Mesa quite extensively because there was simply nothing
> better to do for r300g. I don't expect that's the case with the other
> drivers.
> 
> Back on topic. The reason why you don't see a performance regression may
> be: 1) The vertex array state management is not the bottleneck.
> 2) You already hit the slowest path, that is recomputing the state
> completely in every draw call.

Well, in general, to me performance *is* a feature. At least for this kind of 
software like an OpenGL stack.
But correctness is not just a feature, correctness is a hard requirement.
So I believe I am all with you...

Let me explain, the reason why I am starting work on the vertex arrays is that 
I think this is one of the major bottlenecks in the draw path that we have 
today.
I think that the array object should behave more like a state that could be 
used to do just the required things to be ready for the draw code path.
The first step towards this was to make the array object an array of client 
arrays. Then start exploiting more of bitmask/ffs driven loops to reduce the 
overhead of iterating over all client arrays on almost every draw. Being able 
to accumulate changed arrays in this kind of flag values is something that 
could be done now. I have a proof of concept implementation sitting around 
here that updates the traditional client array pointers used in vbo_draw in a 
lazy way based on an enabled bitmasks and iterating only on the changed values 
instead of all. Also we could really derive from the generic array object 
implementation for the old style vbo_draw and an other implementation for a 
gallium driver path where we are really able to track changes in the array cso 
thats currently unconditional application to the gallium state could then 
probably be pulled out of the draw path for the gallium drivers, which I 
expect a major win for plenty of use cases...

So far to what could be done. But at least to me, experience shows that I 
better do this kind of stuff, which is on the first front to the OpenGL api 
specs and also used in so different drivers in so different ways, with very 
small increments. So this current increment is trying to get back to a stable 
state past the array change.

Ok, back to topic, so have I understood right? You have fixed torcs not to use 
the slow path anymore? Which old version is the one still having this issue?
Do you have any test program that could be used to see the impact of this kind 
of change?

Also your comment mentions that the _Tnl program already cares for 
invalidating the required state. Do you remember where this happend? May be we 
could utilize this place to trigger state invalidation then?

Thanks

Mathias