[Mesa-dev] r600g: status of the r600-sb branch

Fri Apr 19 09:18:08 PDT 2013

On 04/19/2013 07:13 PM, � wrote:
> Hi Vadim,
>
> from your description it seems to be a post processing stage working on
> the bytecode of the shaders and additional to that is quite separated
> from the rest of the driver.

Yes, currently it's more like a post-processing stage, though on the 
other hand the only missing thing to consider it as a complete backend 
is an initial TGSI translator (that is, a sort of instruction selection 
pass). Basically it's exactly what default backend in the r600g does. I 
thought about writing direct translator from TGSI to my IR, but it would 
require some time and benefits aren't very clear, except the slightly 
reduced translation time. It's easier to rely on the default backend for 
that, and also it simplifies debugging by providing the ability to see 
and compare both the source (after default backend) and optimized bytecode.

>
> If that's the case then I don't really see a reason why we shouldn't
> merge it, but at least at the beginning it should probably be disabled
> by default.

Yes, I agree that it's better to make it disabled as default, it's 
currently enabled in my branch just to simplify testing, but I'll change 
that in case if we'll merge the branch.

>
> On the other hand we should question if there are any optimizations in
> there that could be done on earlier stages, something like on the GLSL
> level for example?

In theory, yes, some optimizations in this branch are typically used on 
the earlier compilation stages, not on the target machine code. On the 
other hand, there are some differences that might make it harder, e.g. 
many algorithms require SSA form, and though it's possible to do similar 
optimizations without SSA, it would be hard to implement. Also I wanted 
to support both default backend and llvm backend for increased testing 
coverage and to be able to compare the efficiency of the algorithms in 
my experiments etc.

Vadim

>
> Cheers,
> Christian.
>
> Am 19.04.2013 16:48, schrieb Vadim Girlin:
>> Hi,
>>
>> In the previous status update I said that the r600-sb branch is not
>> ready to be merged yet, but recently I've done some cleanups and
>> reworks, and though I haven't finished everything that I planned
>> initially, I think now it's in a better state and may be considered
>> for merging.
>>
>> I'm interested to know if the people think that merging of the r600-sb
>> branch makes sense at all. I'll try to explain here why it makes sense
>> to me.
>>
>> Although I understand that the development of llvm backend is a
>> primary goal for the r600g developers, it's a complicated process and
>> may require quite some time to achieve good results regarding the
>> shader/compiler performance, and at the same time this branch already
>> works and provides good results in many cases. That's why I think it
>> makes sense to merge this branch as a non-default backend at least as
>> a temporary solution for shader performance problems. We can always
>> get rid of it if it becomes too much a maintenance burden or when llvm
>> backend catches up in terms of shader performance and compilation
>> speed/overhead.
>>
>> Regarding the support and maintenance of this code, I'll try to do my
>> best to fix possible issues, and so far there are no known unfixed
>> issues. I tested it with many apps on evergreen and fixed all issues
>> with other chips that were reported to me on the list or privately
>> after the last status announce. There are no piglit regressions on
>> evergreen when this branch is used with both default and llvm backends.
>>
>> This code was intentionally separated as much as possible from the
>> other parts of the driver, basically there are just two functions used
>> from r600g, and the shader code is passed to/from r600-sb as a
>> hardware bytecode that is not going to change. I think it won't
>> require any modifications at all to keep it in sync with the most
>> changes in r600g.
>>
>> Some work might be required though if we'll want to add support for
>> the new hw features that are currently unused, e.g. geometry shaders,
>> new instruction types for compute shaders, etc, but I think I'll be
>> able to catch up when it's implemented in the driver and default or
>> llvm backend. E.g. this branch already works for me on evergreen with
>> some simple OpenCL kernels, including bfgminer where it increases
>> performance of the kernel compiled with llvm backend by more than 20%
>> for me.
>>
>> Besides the performance benefits, I think that alternative backend
>> also might help with debugging of the default or llvm backend, in some
>> cases it helped me by exposing the bugs that are not very obvious
>> otherwise, e.g. it may be hard to compare the dumps from default and
>> llvm backend to spot the regression because they are too different,
>> but after processing both shaders with r600-sb the code is usually
>> transformed to some more common form, and often this makes it easier
>> to compare and find the differences in shader logic.
>>
>> One additional feature that might help with llvm backend debugging is
>> the disassembler that works on the hardware bytecode instead of the
>> internal r600g bytecode structs. This results in the more readable
>> shader dumps for instructions passed in native hw encoding from llvm
>> backend. I think this also can help to catch more potential bugs
>> related to bytecode building in r600g/llvm. Currently r600-sb uses its
>> bytecode disassembler for all shader dumps, including the fetch
>> shaders, even when optimization is not enabled. Basically it can
>> replace r600_bytecode_disasm and related code completely.
>>
>> Below are some quick benchmarks for shader performance and compilation
>> time, to demonstrate that currently r600-sb might provide better
>> performance for users, at least in some cases.
>>
>> As an example of the shaders with good optimization opportunities I
>> used the application that computes and renders atmospheric scattering
>> effects, it was mentioned in the previous thread:
>> http://lists.freedesktop.org/archives/mesa-dev/2013-February/034682.html
>>
>> Here are current results for that app (Main.noprecompute, frames per
>> second) with default backend, default backend + r600-sb, and llvm
>> backend:
>>     def    def+sb    llvm
>>     240    590    248
>>
>> Another quick benchmark is an OpenCL kernel performance with bfgminer
>> (megahash/s):
>>     llvm    llvm+sb
>>     68    87
>>
>> One more benchmark is for compilation speed/overhead - I used two
>> piglit tests, first compiles a lot of shaders (IIRC more than
>> thousand), second compiles a few huge shaders. Result is a test run
>> time in seconds, this includes not only the compilation time but
>> anyway shows the difference:
>>             def    def+sb    llvm
>> tfb max-varyings    10    14    53
>> fp-long-alu        0.17    0.38    0.68
>>
>> This is especially important for GL apps, because longer compilation
>> time results in the more significant freezes in the games etc. As for
>> the quality of the compiled code in this test, of course generally
>> llvm backend is already able to produce better code in some cases, but
>> e.g. for the longest shader from the fp-long-alu test both backends
>> optimize it to the two alu instructions.
>>
>> Of course this branch won't magically make all applications faster,
>> many older apps are not really limited by the shader performance at
>> all, but I think it might improve performance for many relatively
>> modern applications/engines, e.g. for the applications based on the
>> Unigine and Source engines.
>>
>> The branch itself can be found here:
>>
>> http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb
>>
>> You might prefer to browse new files in a tree instead of reading a
>> huge patch:
>>
>> http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb?h=r600-sb
>>
>>
>> If you'd like to test it, currently the optimization for GL shaders is
>> enabled by default, can be disabled with R600_SB=0. Optimization for
>> compute shaders is not enabled by default because it's still very
>> limited and experimental, can be enabled with R600_SB_CL=1.
>> Disassemble of the optimized shaders is printed with R600_DUMP_SHADERS=2.
>>
>> If you think that merging of the branch makes sense, any
>> comments/suggestions about what is required to prepare the branch for
>> merging are welcome.
>>
>> Vadim
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>