[Mesa-dev] r600g: status of the r600-sb branch

Sat Apr 20 04:38:41 PDT 2013

Am 20.04.2013 13:12, schrieb Vadim Girlin:
> On 04/20/2013 01:42 PM, Christian König wrote:
>> Am 19.04.2013 18:50, schrieb Vadim Girlin:
>>> On 04/19/2013 08:35 PM, Christian König wrote:
>>>> Hey Vadim,
>>>>
>>>> Am 19.04.2013 18:18, schrieb Vadim Girlin:
>>>>> [SNIP]
>>>>>
>>>>> In theory, yes, some optimizations in this branch are typically used
>>>>> on the earlier compilation stages, not on the target machine code. On
>>>>> the other hand, there are some differences that might make it harder,
>>>>> e.g. many algorithms require SSA form, and though it's possible to do
>>>>> similar optimizations without SSA, it would be hard to implement. 
>>>>> Also
>>>>> I wanted to support both default backend and llvm backend for
>>>>> increased testing coverage and to be able to compare the 
>>>>> efficiency of
>>>>> the algorithms in my experiments etc.
>>>>
>>>> Yeah I know, missing an SSA implementation is also something that 
>>>> always
>>>> bothered me a bit with both TGSI and GLSL (while I haven't done much
>>>> with GLSL, so maybe I misjudge here).
>>>>
>>>> Can you name the different algorithms used?
>>>
>>> There is a short description of the algorithms and passes in the
>>> notes.markdown file [1] in that branch, there are also links in the
>>> end to the full description of some algorithms, though some of them
>>> were modified/adapted for this branch.
>>>
>>>> It's not a strict prerequisite, but I think we both agree that doing
>>>> things like LICM on R600 bytecode isn't the best idea over all (when
>>>> doing it on GLSL would be beneficial for all drivers not only r600).
>>>
>>> In fact there is no special LICM pass, it's done by the GCM (Global
>>> Code Motion, [2]), which probably could be also called global
>>> scheduler. In fact in my branch this pass is combined with some
>>> hw-specific scheduling logic, e.g. grouping fetch/alu instructions to
>>> reduce clause type switching in the code and the number of required CF
>>> instructions, potentially it can also schedule clauses to expose more
>>> parallelism with the BARRIER bit usage.
>>>
>>
>> Yeah I already thought that you're using something like this.
>>
>> On one hand that is really good, cause it is specialized on so produces
>> really optimal code for the r600 target. But on the other hand it's bad,
>> cause it is specialized on so produces really optimal code ONLY on the
>> r600 target....
>
> I think such pass on higher level (GLSL IR or TGSI) would at least 
> need some callbacks or caps to be tunable for the target.
>
> Anyway the result of GCM pass is affected by the CFG structure, so 
> when the target applies e.g. if-conversion or any other 
> target-specific control flow optimization, this means that you might 
> want to apply similar pass again on the target instruction level for 
> better results, and then previous pass on higher level IR looks not 
> very useful.
>
> Also there are some high level operations that are translated to the 
> bunch of target instructions, e.g. integer division on r600. 
> High-level pass can't hoist "i/5" (where i is loop counter) out of the 
> loop, but after translation to target instructions it's possible to 
> hoist some of the resulting instructions, producing more efficient code.
>
> One more point is that GCM allows to achieve best efficiency when used 
> with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not 
> care about code placement during elimination of redundant operations, 
> so you'll probably want to implement high-level GVN pass as well.
>
> I think it's possible to implement GVN-GCM on GLSL or TGSI level, but 
> I suspect it will require a lot more efforts than it was required by 
> implementation of these passes in my branch, and will be less efficient.
>
>>
>> Just speculating, what would it take to make those passes run on the
>> LLVM Machine Instruction representation instead of your own 
>> representation?
>
> Main difference between IRs is the representation of control flow, 
> r600-sb relies on the fact that r600 arch doesn't have arbitrary 
> control flow, this renders CFGs superfluous. Implementation of these 
> passes on CFGs will be more complicated, it will also require the 
> computation of dominance frontiers, loops detection and analysis, etc. 
> On the r600-sb's IR these passes are greatly simplified.
>
> Regarding the GCM, original algorithm as described in that pdf works 
> on the CFG, so it shouldn't be hard to implement in LLVM, but I'm not 
> sure how it will fit into the LLVM infrastructure. LLVM has GVN-PRE, 
> LICM and other passes that together do basically the same thing as 
> GVN-GCM, so if you implement it, you might want to get rid of LLVM's 
> own passes that duplicate the same functionality, and I'm not sure if 
> this would be easy, possibly there are some interdependencies etc. 
> Also I saw mentions of some plans (e.g. [1],[2]) regarding the 
> implementation of global code motion in LLVM, looks like there is 
> already some work in progress.
>

Oh, I wasn't taking about replacing any LLVM passes, more like extending 
them to provide the same amount of functionality. Also I hadn't had LLVM 
IR in mind while writing this, but more the machine instruction 
representation they use.

Well you have quite allot of C++ code here, and a big bunch of it just 
deals with bringing the bytecode into a representation where you can run 
your algorithms on it.

Christian.

> Vadim
>
> [1] 
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120709/146206.html
> [2] 
> http://markmail.org/message/2td3fnnggk6oripp#query:+page:1+mid:2td3fnnggk6oripp+state:results
>
>
>> Christian.
>>
>>> Vadim
>>>
>>>  [1]
>>> http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb/notes.markdown?h=r600-sb 
>>>
>>>
>>>  [2]
>>> http://www.cs.washington.edu/education/courses/cse501/06wi/reading/click-pldi95.pdf 
>>>
>>>
>>>
>>>> Regards,
>>>> Christian.
>>>
>>>
>>
>
>