[Mesa-dev] r600g: status of my work on the shader optimization

Fri Feb 15 13:43:20 PST 2013

On 02/15/2013 06:31 PM, Tom Stellard wrote:
> On Fri, Feb 15, 2013 at 03:00:24PM +0400, Vadim Girlin wrote:
>> On 02/14/2013 02:42 PM, Christian K�nig wrote:
>>> Hi Vadim,
>>>
>>> nice work, I think you've made quite a progress here, but on the other
>>> hand it should be clear that the LLVM backend is the future and we
>>> should concentrate on that.
>>
>> "LLVM backend is the future" is a pretty abstract argument. I prefer
>> to operate with real facts. After a year of LLVM backend development
>> what are the real benefits for the users? What are the real use
>> cases where the users might prefer LLVM backend? To me this
>> situation looks like the use of LLVM requires a lot more time and
>> development efforts than the custom solution, despite the initial
>> expectations. Maybe you are right and the LLVM backend will become
>> the best alternative for users sometime in the future, but I only
>> have some today's results:
>>
>> Heaven 3.0, all settings high/enabled, 1280x720, HD5750:
>>    default backend : 20.0 fps
>>    llvm backend    : 18.8 fps
>>    r600-sb         : 38.0 fps
>>
>
> Hi Vadim,
>
> A month or so ago you wrote an initial machine scheduler implmentation
> for the R600 LLVM backend and said it had a big impact on performance.
> When you tested the LLVM backend in these tests, did you have that patch
> applied?
>

No, that patch was not used, and one of the reasons is that testing some 
old branch of LLVM requires finding out corresponding point in the mesa 
where they are in sync, and this may be not a trivial quest when you are 
not following the development. After this mail I tried to find 
up-to-date version of that patch in your and Vincent's repos, and it 
seems the 'scheduling' branch in the Vincent's repo works. The result 
with the same benchmark is 23.2 fps, that is, better scheduling with 
Vincent's improvements adds ~23% to LLVM result.

Vadim

> -Tom
>
>
>> When I'm looking at these results, the benefits of LLVM-based
>> solution are not very clear to me.
>>
>> I'm not trying to persuade anyone, just wanted to explain why I
>> decided to switch back to work on the non-LLVM solution.
>>
>> Anyway, it's absolutely not a problem for me if this branch will
>> never make it to mesa, I was ready to this before I started. One of
>> the goals of this branch was just to show that the use of LLVM is
>> possibly not the the best way of the GL shaders compilation for
>> r600g. And another goal, of course, is to get better performance
>> with r600g *today*, not in the future.
>>
>> Vadim
>>
>>>
>>> To sum it up I'm not sure what we should do with this branch :)
>>>
>>> As Dragomir already wrote even if the code won't be used much the
>>> know-how you gained while coding it will stay, believe me that this is
>>> or far more value than the code itself.
>>>
>>> Christian.
>>>
>>> Am 14.02.2013 11:10, schrieb Dragomir Ivanov:
>>>> Greetings,
>>>> I hope that, even if you work will be short-lived, e.g. until LLVM
>>>> bytecode compiler takes off, the know-how is still very useful.
>>>>
>>>>
>>>> On Thu, Feb 14, 2013 at 4:04 AM, Vadim Girlin <vadimgirlin at gmail.com
>>>> <mailto:vadimgirlin at gmail.com>> wrote:
>>>>
>>>>     Hi,
>>>>
>>>>     Last month I finally found the time to work on the rewrite of my
>>>>     previous shader optimization branch, now it's mostly done in terms
>>>>     of the correctness of produced code and feature support (at least
>>>>     on evergreen), though it's still a work in progress in terms of
>>>>     the efficiency of generated shader code and the efficiency of the
>>>>     backend itself.
>>>>
>>>>     I spent some time last year studying the LLVM infrastructure and
>>>>     R600 LLVM backend and trying to improve it, but after all I came
>>>>     to the conclusion that for me it might be easier to implement all
>>>>     that I wanted in the custom backend. This allows for more simple
>>>>     and efficient implementation - e.g. I don't have to deal with CFGs
>>>>     because in fact we have structured code, so it's possible to use
>>>>     more simple and efficient algorithms.
>>>>
>>>>     Currently the branch has no regressions with piglit's
>>>>     quick-driver.tests on evergreen (it doesn't rely on the fallback
>>>>     to unoptimized code for the shaders with relative addressing and
>>>>     other cases unlike the previous branch), and so far I don't see
>>>>     any rendering issues with the apps that I used for testing -
>>>>      Lightsmark 2008, Unigine Heaven 3.0 and some others.. There are
>>>>     also some performance improvements with the gpu-bound apps.
>>>>
>>>>     I tried to keep in mind the differences between chip classes, so I
>>>>     hope it should only require minor fixes to make it work on
>>>>     non-evergreen chips, but I doubt that it will work out of the box
>>>>     - support for some non-evergreen hw-specific features is still
>>>>     missing, e.g. I'm sure that indirect addressing currently won't
>>>>     work on R6xx, though basic tests might work in theory. Fixing this
>>>>     shouldn't require a lot of work though.
>>>>
>>>>     The branch can be found in my freedesktop repo:
>>>>
>>>>     http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb
>>>>     <http://cgit.freedesktop.org/%7Evadimg/mesa/log/?h=r600-sb>
>>>>
>>>>     Regarding the differences from the previous branch - there are
>>>>     some additional optimizations, e.g. global value numbering with
>>>>     some basic support for constant folding (not all instructions are
>>>>     currently handled, but it's easy to extend), global code motion
>>>>     that can hoist invariant code out of the loops etc. Some
>>>>     optimizations that were implemented in the previous branch are not
>>>>     implemented in the new branch (yet), e.g. propagation of modifiers
>>>>     (I'm not even sure if it has any noticeable effect on performance).
>>>>
>>>>     Unlike the previous branch, there is support for indirect
>>>>     addressing on registers -  currently it uses my previously posted
>>>>     patch (that was not very welcome) for obtaining the  information
>>>>     about addressable register ranges, but it's not required and can
>>>>     be dropped, I just used that patch for testing. Without that
>>>>     information opportunities for optimization are limited though, and
>>>>     perhaps it makes sense to not try to optimize the shaders with
>>>>     indirect gpr addressing at all and rely on the old backend until
>>>>     we'll have the proper solution to pass that information to the
>>>>     drivers.
>>>>
>>>>     There is also initial support for ALU predication, but it's not
>>>>     complete and currently unused, I'm not sure if predication support
>>>>     will have significant effect on performance that will justify more
>>>>     complex and expensive algorithms for register allocator and
>>>>     scheduler, probably I'll look into it later, I consider this as a
>>>>     low priority. In the case of predicated source code (from LLVM
>>>>     backend) the predication is eliminated using speculative execution
>>>>     and conditional moves, same as with the simple if-conversion pass
>>>>     that is also implemented.
>>>>
>>>>     The branch currently uses as source the bytecode built by the old
>>>>     backend (that may also come from LLVM backend) and some additional
>>>>     information (about inputs etc), final bytecode is built by the new
>>>>     builder in the branch. Building two versions of the bytecode
>>>>     doesn't look very efficient, but currently it simplifies
>>>>     debugging. I'm planning to implement translation from TGSI
>>>>     directly to my representation, it should simplify the translator
>>>>     and allow to get rid of unnecessary intermediate passes.
>>>>
>>>>     Some old and new environment variables can be used to control the
>>>>     behavior of this backend:
>>>>
>>>>     R600_SB - 0 - disable new backend completely, 1 - enable (default)
>>>>     R600_SB_USE_NEW_BYTECODE - 0 - disable use of the produced
>>>>     bytecode (useful if you only want to look at the dump of the
>>>>     optimized shader without passing it to hw), 1 - enable (default)
>>>>     R600_DUMP_SHADERS - will also dump the dissasemble of the
>>>>     optimized shader after original bytecode (if backend is not
>>>>     disabled with R600_SB=0).
>>>>
>>>>     Produced shader code is not ideal - e.g. you may notice not very
>>>>     necessary MOVs inserted before DOT4 instructions, it's a known
>>>>     issue and I'm going to look into it - this may require rework of
>>>>     the regalloc/scheduler. I had to sacrifice some features to make
>>>>     it work correctly with Heaven first, so that now I can try to
>>>>     improve it while being able to test for regressions.
>>>>
>>>>     Also probably there are some issues with the cleanness of the code
>>>>     - I had to rework some parts a few times while fixing all
>>>>     problems, so there is possibly unused code and other remnants of
>>>>     the previous versions. Anyway, I still consider it as a work in
>>>>     progress and some things are going to be reworked.
>>>>
>>>>     I'm not sure what will be the destiny of this branch, taking into
>>>>     account that we also have actively developed LLVM backend that is
>>>>     required for OpenCL anyway. Your opinions are welcome.
>>>>
>>>>     Vadim
>>>>     _______________________________________________
>>>>     mesa-dev mailing list
>>>>     mesa-dev at lists.freedesktop.org
>>>> <mailto:mesa-dev at lists.freedesktop.org>
>>>>     http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> mesa-dev mailing list
>>>> mesa-dev at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>>>
>>
>>
>>
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev