[Mesa-dev] r600g: status of the r600-sb branch

Fri Apr 19 07:48:43 PDT 2013

Hi,

In the previous status update I said that the r600-sb branch is not 
ready to be merged yet, but recently I've done some cleanups and 
reworks, and though I haven't finished everything that I planned 
initially, I think now it's in a better state and may be considered for 
merging.

I'm interested to know if the people think that merging of the r600-sb 
branch makes sense at all. I'll try to explain here why it makes sense 
to me.

Although I understand that the development of llvm backend is a primary 
goal for the r600g developers, it's a complicated process and may 
require quite some time to achieve good results regarding the 
shader/compiler performance, and at the same time this branch already 
works and provides good results in many cases. That's why I think it 
makes sense to merge this branch as a non-default backend at least as a 
temporary solution for shader performance problems. We can always get 
rid of it if it becomes too much a maintenance burden or when llvm 
backend catches up in terms of shader performance and compilation 
speed/overhead.

Regarding the support and maintenance of this code, I'll try to do my 
best to fix possible issues, and so far there are no known unfixed 
issues. I tested it with many apps on evergreen and fixed all issues 
with other chips that were reported to me on the list or privately after 
the last status announce. There are no piglit regressions on evergreen 
when this branch is used with both default and llvm backends.

This code was intentionally separated as much as possible from the other 
parts of the driver, basically there are just two functions used from 
r600g, and the shader code is passed to/from r600-sb as a hardware 
bytecode that is not going to change. I think it won't require any 
modifications at all to keep it in sync with the most changes in r600g.

Some work might be required though if we'll want to add support for the 
new hw features that are currently unused, e.g. geometry shaders, new 
instruction types for compute shaders, etc, but I think I'll be able to 
catch up when it's implemented in the driver and default or llvm 
backend. E.g. this branch already works for me on evergreen with some 
simple OpenCL kernels, including bfgminer where it increases performance 
of the kernel compiled with llvm backend by more than 20% for me.

Besides the performance benefits, I think that alternative backend also 
might help with debugging of the default or llvm backend, in some cases 
it helped me by exposing the bugs that are not very obvious otherwise, 
e.g. it may be hard to compare the dumps from default and llvm backend 
to spot the regression because they are too different, but after 
processing both shaders with r600-sb the code is usually transformed to 
some more common form, and often this makes it easier to compare and 
find the differences in shader logic.

One additional feature that might help with llvm backend debugging is 
the disassembler that works on the hardware bytecode instead of the 
internal r600g bytecode structs. This results in the more readable 
shader dumps for instructions passed in native hw encoding from llvm 
backend. I think this also can help to catch more potential bugs related 
to bytecode building in r600g/llvm. Currently r600-sb uses its bytecode 
disassembler for all shader dumps, including the fetch shaders, even 
when optimization is not enabled. Basically it can replace 
r600_bytecode_disasm and related code completely.

Below are some quick benchmarks for shader performance and compilation 
time, to demonstrate that currently r600-sb might provide better 
performance for users, at least in some cases.

As an example of the shaders with good optimization opportunities I used 
the application that computes and renders atmospheric scattering 
effects, it was mentioned in the previous thread:
http://lists.freedesktop.org/archives/mesa-dev/2013-February/034682.html

Here are current results for that app (Main.noprecompute, frames per 
second) with default backend, default backend + r600-sb, and llvm backend:
	def	def+sb	llvm
	240	590	248

Another quick benchmark is an OpenCL kernel performance with bfgminer 
(megahash/s):
	llvm	llvm+sb
	68	87		

One more benchmark is for compilation speed/overhead - I used two piglit 
tests, first compiles a lot of shaders (IIRC more than thousand), second 
compiles a few huge shaders. Result is a test run time in seconds, this 
includes not only the compilation time but anyway shows the difference:
			def	def+sb	llvm
tfb max-varyings	10	14	53
fp-long-alu		0.17	0.38	0.68

This is especially important for GL apps, because longer compilation 
time results in the more significant freezes in the games etc. As for 
the quality of the compiled code in this test, of course generally llvm 
backend is already able to produce better code in some cases, but e.g. 
for the longest shader from the fp-long-alu test both backends optimize 
it to the two alu instructions.

Of course this branch won't magically make all applications faster, many 
older apps are not really limited by the shader performance at all, but 
I think it might improve performance for many relatively modern 
applications/engines, e.g. for the applications based on the Unigine and 
Source engines.

The branch itself can be found here:

http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb

You might prefer to browse new files in a tree instead of reading a huge 
patch:

http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb?h=r600-sb

If you'd like to test it, currently the optimization for GL shaders is 
enabled by default, can be disabled with R600_SB=0. Optimization for 
compute shaders is not enabled by default because it's still very 
limited and experimental, can be enabled with R600_SB_CL=1. Disassemble 
of the optimized shaders is printed with R600_DUMP_SHADERS=2.

If you think that merging of the branch makes sense, any 
comments/suggestions about what is required to prepare the branch for 
merging are welcome.

Vadim