[Mesa-dev] r600g: status of the r600-sb branch

Sat Apr 20 02:02:34 PDT 2013

On 04/20/2013 03:11 AM, Marek Olšák wrote:
> Please don't add any new environment variables and use R600_DEBUG
> instead. The other environment variables are deprecated.

I agree, those vars probably need some cleanup, they were added before 
R600_DEBUG appeared.

Though I'm afraid some of my options won't fit well into the R600_DEBUG 
flags, unless we'll add support for the name/value pairs with optional 
custom parsers.

E.g. I have a group of env vars to define the range of included/excluded 
shaders for optimization and mode (include/exclude/off), I thought about 
doing this with a single var and custom parser to specify the range e.g. 
as "10-20", but after all it's just a debug feature, not intended for 
everyday use, and so far I failed to convince myself that it's worth the 
efforts.

I can implement the support for custom parsers for R600_DEBUG, but do we 
really need it? Maybe it would be enough to add e.g."sb" instead of 
R600_SB var to the R600_DEBUG flags for enabling it (probably together 
with other boolean options such as R600_SB_USE_NEW_BYTECODE) but leave 
more complicated internal debug options as is?

Vadim

> There is a table for R600_DEBUG in r600_pipe.c and it even comes with
> a help feature: R600_DEBUG=help
>
> Marek
>
> On Fri, Apr 19, 2013 at 4:48 PM, Vadim Girlin <vadimgirlin at gmail.com> wrote:
>> Hi,
>>
>> In the previous status update I said that the r600-sb branch is not ready to
>> be merged yet, but recently I've done some cleanups and reworks, and though
>> I haven't finished everything that I planned initially, I think now it's in
>> a better state and may be considered for merging.
>>
>> I'm interested to know if the people think that merging of the r600-sb
>> branch makes sense at all. I'll try to explain here why it makes sense to
>> me.
>>
>> Although I understand that the development of llvm backend is a primary goal
>> for the r600g developers, it's a complicated process and may require quite
>> some time to achieve good results regarding the shader/compiler performance,
>> and at the same time this branch already works and provides good results in
>> many cases. That's why I think it makes sense to merge this branch as a
>> non-default backend at least as a temporary solution for shader performance
>> problems. We can always get rid of it if it becomes too much a maintenance
>> burden or when llvm backend catches up in terms of shader performance and
>> compilation speed/overhead.
>>
>> Regarding the support and maintenance of this code, I'll try to do my best
>> to fix possible issues, and so far there are no known unfixed issues. I
>> tested it with many apps on evergreen and fixed all issues with other chips
>> that were reported to me on the list or privately after the last status
>> announce. There are no piglit regressions on evergreen when this branch is
>> used with both default and llvm backends.
>>
>> This code was intentionally separated as much as possible from the other
>> parts of the driver, basically there are just two functions used from r600g,
>> and the shader code is passed to/from r600-sb as a hardware bytecode that is
>> not going to change. I think it won't require any modifications at all to
>> keep it in sync with the most changes in r600g.
>>
>> Some work might be required though if we'll want to add support for the new
>> hw features that are currently unused, e.g. geometry shaders, new
>> instruction types for compute shaders, etc, but I think I'll be able to
>> catch up when it's implemented in the driver and default or llvm backend.
>> E.g. this branch already works for me on evergreen with some simple OpenCL
>> kernels, including bfgminer where it increases performance of the kernel
>> compiled with llvm backend by more than 20% for me.
>>
>> Besides the performance benefits, I think that alternative backend also
>> might help with debugging of the default or llvm backend, in some cases it
>> helped me by exposing the bugs that are not very obvious otherwise, e.g. it
>> may be hard to compare the dumps from default and llvm backend to spot the
>> regression because they are too different, but after processing both shaders
>> with r600-sb the code is usually transformed to some more common form, and
>> often this makes it easier to compare and find the differences in shader
>> logic.
>>
>> One additional feature that might help with llvm backend debugging is the
>> disassembler that works on the hardware bytecode instead of the internal
>> r600g bytecode structs. This results in the more readable shader dumps for
>> instructions passed in native hw encoding from llvm backend. I think this
>> also can help to catch more potential bugs related to bytecode building in
>> r600g/llvm. Currently r600-sb uses its bytecode disassembler for all shader
>> dumps, including the fetch shaders, even when optimization is not enabled.
>> Basically it can replace r600_bytecode_disasm and related code completely.
>>
>> Below are some quick benchmarks for shader performance and compilation time,
>> to demonstrate that currently r600-sb might provide better performance for
>> users, at least in some cases.
>>
>> As an example of the shaders with good optimization opportunities I used the
>> application that computes and renders atmospheric scattering effects, it was
>> mentioned in the previous thread:
>> http://lists.freedesktop.org/archives/mesa-dev/2013-February/034682.html
>>
>> Here are current results for that app (Main.noprecompute, frames per second)
>> with default backend, default backend + r600-sb, and llvm backend:
>>          def     def+sb  llvm
>>          240     590     248
>>
>> Another quick benchmark is an OpenCL kernel performance with bfgminer
>> (megahash/s):
>>          llvm    llvm+sb
>>          68      87
>>
>> One more benchmark is for compilation speed/overhead - I used two piglit
>> tests, first compiles a lot of shaders (IIRC more than thousand), second
>> compiles a few huge shaders. Result is a test run time in seconds, this
>> includes not only the compilation time but anyway shows the difference:
>>                          def     def+sb  llvm
>> tfb max-varyings        10      14      53
>> fp-long-alu             0.17    0.38    0.68
>>
>> This is especially important for GL apps, because longer compilation time
>> results in the more significant freezes in the games etc. As for the quality
>> of the compiled code in this test, of course generally llvm backend is
>> already able to produce better code in some cases, but e.g. for the longest
>> shader from the fp-long-alu test both backends optimize it to the two alu
>> instructions.
>>
>> Of course this branch won't magically make all applications faster, many
>> older apps are not really limited by the shader performance at all, but I
>> think it might improve performance for many relatively modern
>> applications/engines, e.g. for the applications based on the Unigine and
>> Source engines.
>>
>> The branch itself can be found here:
>>
>> http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb
>>
>> You might prefer to browse new files in a tree instead of reading a huge
>> patch:
>>
>> http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb?h=r600-sb
>>
>> If you'd like to test it, currently the optimization for GL shaders is
>> enabled by default, can be disabled with R600_SB=0. Optimization for compute
>> shaders is not enabled by default because it's still very limited and
>> experimental, can be enabled with R600_SB_CL=1. Disassemble of the optimized
>> shaders is printed with R600_DUMP_SHADERS=2.
>>
>> If you think that merging of the branch makes sense, any
>> comments/suggestions about what is required to prepare the branch for
>> merging are welcome.
>>
>> Vadim
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev