[Mesa-dev] r600g: status of my work on the shader optimization
Vadim Girlin
vadimgirlin at gmail.com
Fri Feb 15 18:31:33 PST 2013
On 02/15/2013 03:22 PM, Christian König wrote:
> Am 15.02.2013 12:00, schrieb Vadim Girlin:
>> On 02/14/2013 02:42 PM, Christian König wrote:
>>> Hi Vadim,
>>>
>>> nice work, I think you've made quite a progress here, but on the other
>>> hand it should be clear that the LLVM backend is the future and we
>>> should concentrate on that.
>>
>> "LLVM backend is the future" is a pretty abstract argument. I prefer
>> to operate with real facts. After a year of LLVM backend development
>> what are the real benefits for the users? What are the real use cases
>> where the users might prefer LLVM backend? To me this situation looks
>> like the use of LLVM requires a lot more time and development efforts
>> than the custom solution, despite the initial expectations. Maybe you
>> are right and the LLVM backend will become the best alternative for
>> users sometime in the future, but I only have some today's results:
>>
>> Heaven 3.0, all settings high/enabled, 1280x720, HD5750:
>> default backend : 20.0 fps
>> llvm backend : 18.8 fps
>> r600-sb : 38.0 fps
>
> Quite impressive. What's actually doing better than the LLVM backend?
>
I've tried to disable some passes/features to see how much it affects
the performance with Heaven.
With everything enabled, the result is 38.0-38.3.
Without fetch instructions grouping - 32.9
Without if-conversion - 34.4
Without GVN - 37.5
Use of temporary registers seems to have no noticeable effect.
Without fetch grouping, if-conversion, GVN, temp GPRs - 29.3
The remaining passes are required and can't be disabled - basically it's
dead code elimination, global scheduler (GCM), regalloc, alu scheduler.
I hope this information is useful. Due to the lack of
performance-related info about the hardware, the source of some
improvements is not obvious even to me, and some results from above are
not exactly what I expected.
Also I did only one run for each case above, so probably there are some
statistical errors, and it's better to check everything more thoroughly
before relying on these results.
Vadim
>>
>>
>> When I'm looking at these results, the benefits of LLVM-based solution
>> are not very clear to me.
>>
>> I'm not trying to persuade anyone, just wanted to explain why I
>> decided to switch back to work on the non-LLVM solution.
>>
>> Anyway, it's absolutely not a problem for me if this branch will never
>> make it to mesa, I was ready to this before I started. One of the
>> goals of this branch was just to show that the use of LLVM is possibly
>> not the the best way of the GL shaders compilation for r600g. And
>> another goal, of course, is to get better performance with r600g
>> *today*, not in the future.
>
> Yeah, that's why I wrote I'm not sure what to do with it. On one hand
> it's a quite nice improvement that's already working and somewhat
> stable, one the other hand if we merge it we also need to support it. I
> suggest that you try to stabilize it a bit more first and then we see.
>
> Christian.
>
>>
>> Vadim
>>
>>>
>>> To sum it up I'm not sure what we should do with this branch :)
>>>
>>> As Dragomir already wrote even if the code won't be used much the
>>> know-how you gained while coding it will stay, believe me that this is
>>> or far more value than the code itself.
>>>
>>> Christian.
>>>
>>> Am 14.02.2013 11:10, schrieb Dragomir Ivanov:
>>>> Greetings,
>>>> I hope that, even if you work will be short-lived, e.g. until LLVM
>>>> bytecode compiler takes off, the know-how is still very useful.
>>>>
>>>>
>>>> On Thu, Feb 14, 2013 at 4:04 AM, Vadim Girlin <vadimgirlin at gmail.com
>>>> <mailto:vadimgirlin at gmail.com>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Last month I finally found the time to work on the rewrite of my
>>>> previous shader optimization branch, now it's mostly done in terms
>>>> of the correctness of produced code and feature support (at least
>>>> on evergreen), though it's still a work in progress in terms of
>>>> the efficiency of generated shader code and the efficiency of the
>>>> backend itself.
>>>>
>>>> I spent some time last year studying the LLVM infrastructure and
>>>> R600 LLVM backend and trying to improve it, but after all I came
>>>> to the conclusion that for me it might be easier to implement all
>>>> that I wanted in the custom backend. This allows for more simple
>>>> and efficient implementation - e.g. I don't have to deal with CFGs
>>>> because in fact we have structured code, so it's possible to use
>>>> more simple and efficient algorithms.
>>>>
>>>> Currently the branch has no regressions with piglit's
>>>> quick-driver.tests on evergreen (it doesn't rely on the fallback
>>>> to unoptimized code for the shaders with relative addressing and
>>>> other cases unlike the previous branch), and so far I don't see
>>>> any rendering issues with the apps that I used for testing -
>>>> Lightsmark 2008, Unigine Heaven 3.0 and some others.. There are
>>>> also some performance improvements with the gpu-bound apps.
>>>>
>>>> I tried to keep in mind the differences between chip classes, so I
>>>> hope it should only require minor fixes to make it work on
>>>> non-evergreen chips, but I doubt that it will work out of the box
>>>> - support for some non-evergreen hw-specific features is still
>>>> missing, e.g. I'm sure that indirect addressing currently won't
>>>> work on R6xx, though basic tests might work in theory. Fixing this
>>>> shouldn't require a lot of work though.
>>>>
>>>> The branch can be found in my freedesktop repo:
>>>>
>>>> http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb
>>>> <http://cgit.freedesktop.org/%7Evadimg/mesa/log/?h=r600-sb>
>>>>
>>>> Regarding the differences from the previous branch - there are
>>>> some additional optimizations, e.g. global value numbering with
>>>> some basic support for constant folding (not all instructions are
>>>> currently handled, but it's easy to extend), global code motion
>>>> that can hoist invariant code out of the loops etc. Some
>>>> optimizations that were implemented in the previous branch are not
>>>> implemented in the new branch (yet), e.g. propagation of modifiers
>>>> (I'm not even sure if it has any noticeable effect on performance).
>>>>
>>>> Unlike the previous branch, there is support for indirect
>>>> addressing on registers - currently it uses my previously posted
>>>> patch (that was not very welcome) for obtaining the information
>>>> about addressable register ranges, but it's not required and can
>>>> be dropped, I just used that patch for testing. Without that
>>>> information opportunities for optimization are limited though, and
>>>> perhaps it makes sense to not try to optimize the shaders with
>>>> indirect gpr addressing at all and rely on the old backend until
>>>> we'll have the proper solution to pass that information to the
>>>> drivers.
>>>>
>>>> There is also initial support for ALU predication, but it's not
>>>> complete and currently unused, I'm not sure if predication support
>>>> will have significant effect on performance that will justify more
>>>> complex and expensive algorithms for register allocator and
>>>> scheduler, probably I'll look into it later, I consider this as a
>>>> low priority. In the case of predicated source code (from LLVM
>>>> backend) the predication is eliminated using speculative execution
>>>> and conditional moves, same as with the simple if-conversion pass
>>>> that is also implemented.
>>>>
>>>> The branch currently uses as source the bytecode built by the old
>>>> backend (that may also come from LLVM backend) and some additional
>>>> information (about inputs etc), final bytecode is built by the new
>>>> builder in the branch. Building two versions of the bytecode
>>>> doesn't look very efficient, but currently it simplifies
>>>> debugging. I'm planning to implement translation from TGSI
>>>> directly to my representation, it should simplify the translator
>>>> and allow to get rid of unnecessary intermediate passes.
>>>>
>>>> Some old and new environment variables can be used to control the
>>>> behavior of this backend:
>>>>
>>>> R600_SB - 0 - disable new backend completely, 1 - enable (default)
>>>> R600_SB_USE_NEW_BYTECODE - 0 - disable use of the produced
>>>> bytecode (useful if you only want to look at the dump of the
>>>> optimized shader without passing it to hw), 1 - enable (default)
>>>> R600_DUMP_SHADERS - will also dump the dissasemble of the
>>>> optimized shader after original bytecode (if backend is not
>>>> disabled with R600_SB=0).
>>>>
>>>> Produced shader code is not ideal - e.g. you may notice not very
>>>> necessary MOVs inserted before DOT4 instructions, it's a known
>>>> issue and I'm going to look into it - this may require rework of
>>>> the regalloc/scheduler. I had to sacrifice some features to make
>>>> it work correctly with Heaven first, so that now I can try to
>>>> improve it while being able to test for regressions.
>>>>
>>>> Also probably there are some issues with the cleanness of the code
>>>> - I had to rework some parts a few times while fixing all
>>>> problems, so there is possibly unused code and other remnants of
>>>> the previous versions. Anyway, I still consider it as a work in
>>>> progress and some things are going to be reworked.
>>>>
>>>> I'm not sure what will be the destiny of this branch, taking into
>>>> account that we also have actively developed LLVM backend that is
>>>> required for OpenCL anyway. Your opinions are welcome.
>>>>
>>>> Vadim
>>>> _______________________________________________
>>>> mesa-dev mailing list
>>>> mesa-dev at lists.freedesktop.org
>>>> <mailto:mesa-dev at lists.freedesktop.org>
>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> mesa-dev mailing list
>>>> mesa-dev at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>>>
>>
>>
>>
>>
>>
>
More information about the mesa-dev
mailing list