[Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

Wed Nov 17 06:26:09 PST 2010

2010/11/17 Mathias Fröhlich <Mathias.Froehlich at gmx.net>:
>
> Hi,
>
> On Tuesday, November 16, 2010 20:21:26 Jerome Glisse wrote:
>> So i looked a bit more at what path we should try to optimize in the
>> mesa/gallium/pipe infrastructure. Here are some number gathers from
>> games :
>> drawcall /     ps constant   vs constant     ps sampler    vs sampler
>> doom3            1.45             1.39               9.24              9.86
>> nexuiz             6.27             5.98               6.84
>> 7.30 openarena  2805.64             1.38               1.51
>> 1.54
> [...]
>
> Just an other observation:
> I was doing some profiling on OpenSceneGraph based applications. One of which
> the plain osgviewer with numerous models and one of which flightgear.
> Drivers and hardware I have is a FireGL 73???,R520,r300g and a
> HD4890,RV770,r600g. Testing is done in the same cpu board.
>
> One comparison is the draw time in osgviewer. The ones that know this
> application might remember the profiling graph, where you can see how long and
> when cull, draw and, if available gpu rendering happens.
> I was in this case looking at the draw times which is just the time starting
> from the first state change in a frame to the last draw in a frame *excluding*
> the buffer swap/sync/flush and whatever serializes program execution tith the
> gpu.
>
> Comparing these osgviewer draw times with fglrx with my favourite test model
> (fixed function) that is kind of representative for usage in flightgear.
>
> R520
> fglrx           ~0.7ms
> r300g, git  ~1.6ms
>
> The profiling picture of my head is that r300g still spends significant amount
> cpu time in current state attribute handling which is too often looping over
> all possible state attributes. BTW: that was much worse before Fancescos last
> copy to current patches. r300g also spends much time in the draw path in mesa,
> where every draw is looping over all 32 state attributes.
> Doing some proof of concept work on these code paths improoved the draw times
> to 1.2ms on r300g.
> The next cpu hog for r300g is the kernel side of the command stream parser. I
> would expect that something that makes use of preevaluated and validated
> command stream snippets in the kernel that are held for each of the drivers
> state objects and are just used in the executed command stream would help much
> here. Something along the lines of recording command stream macros/substreams
> that are just jumped into when executing the user level command stream. I
> believe that Jerome held some talk about something very similar at this years
> fossdem.
>

I think before commiting to any improvement to command submission we
should carefully think to it and also do all the code from the kernel
to userspace to benchmark it. CS design was maybe good on the paper
but the command checking and relocation is a just killing us.

Cheers,
Jerome