[Mesa-dev] r600g: status of the r600-sb branch

Sat Apr 20 04:12:52 PDT 2013

On 04/20/2013 01:42 PM, Christian König wrote:
> Am 19.04.2013 18:50, schrieb Vadim Girlin:
>> On 04/19/2013 08:35 PM, Christian König wrote:
>>> Hey Vadim,
>>>
>>> Am 19.04.2013 18:18, schrieb Vadim Girlin:
>>>> [SNIP]
>>>>
>>>> In theory, yes, some optimizations in this branch are typically used
>>>> on the earlier compilation stages, not on the target machine code. On
>>>> the other hand, there are some differences that might make it harder,
>>>> e.g. many algorithms require SSA form, and though it's possible to do
>>>> similar optimizations without SSA, it would be hard to implement. Also
>>>> I wanted to support both default backend and llvm backend for
>>>> increased testing coverage and to be able to compare the efficiency of
>>>> the algorithms in my experiments etc.
>>>
>>> Yeah I know, missing an SSA implementation is also something that always
>>> bothered me a bit with both TGSI and GLSL (while I haven't done much
>>> with GLSL, so maybe I misjudge here).
>>>
>>> Can you name the different algorithms used?
>>
>> There is a short description of the algorithms and passes in the
>> notes.markdown file [1] in that branch, there are also links in the
>> end to the full description of some algorithms, though some of them
>> were modified/adapted for this branch.
>>
>>> It's not a strict prerequisite, but I think we both agree that doing
>>> things like LICM on R600 bytecode isn't the best idea over all (when
>>> doing it on GLSL would be beneficial for all drivers not only r600).
>>
>> In fact there is no special LICM pass, it's done by the GCM (Global
>> Code Motion, [2]), which probably could be also called global
>> scheduler. In fact in my branch this pass is combined with some
>> hw-specific scheduling logic, e.g. grouping fetch/alu instructions to
>> reduce clause type switching in the code and the number of required CF
>> instructions, potentially it can also schedule clauses to expose more
>> parallelism with the BARRIER bit usage.
>>
>
> Yeah I already thought that you're using something like this.
>
> On one hand that is really good, cause it is specialized on so produces
> really optimal code for the r600 target. But on the other hand it's bad,
> cause it is specialized on so produces really optimal code ONLY on the
> r600 target....

I think such pass on higher level (GLSL IR or TGSI) would at least need 
some callbacks or caps to be tunable for the target.

Anyway the result of GCM pass is affected by the CFG structure, so when 
the target applies e.g. if-conversion or any other target-specific 
control flow optimization, this means that you might want to apply 
similar pass again on the target instruction level for better results, 
and then previous pass on higher level IR looks not very useful.

Also there are some high level operations that are translated to the 
bunch of target instructions, e.g. integer division on r600. High-level 
pass can't hoist "i/5" (where i is loop counter) out of the loop, but 
after translation to target instructions it's possible to hoist some of 
the resulting instructions, producing more efficient code.

One more point is that GCM allows to achieve best efficiency when used 
with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not care 
about code placement during elimination of redundant operations, so 
you'll probably want to implement high-level GVN pass as well.

I think it's possible to implement GVN-GCM on GLSL or TGSI level, but I 
suspect it will require a lot more efforts than it was required by 
implementation of these passes in my branch, and will be less efficient.

>
> Just speculating, what would it take to make those passes run on the
> LLVM Machine Instruction representation instead of your own representation?

Main difference between IRs is the representation of control flow, 
r600-sb relies on the fact that r600 arch doesn't have arbitrary control 
flow, this renders CFGs superfluous. Implementation of these passes on 
CFGs will be more complicated, it will also require the computation of 
dominance frontiers, loops detection and analysis, etc. On the r600-sb's 
IR these passes are greatly simplified.

Regarding the GCM, original algorithm as described in that pdf works on 
the CFG, so it shouldn't be hard to implement in LLVM, but I'm not sure 
how it will fit into the LLVM infrastructure. LLVM has GVN-PRE, LICM and 
other passes that together do basically the same thing as GVN-GCM, so if 
you implement it, you might want to get rid of LLVM's own passes that 
duplicate the same functionality, and I'm not sure if this would be 
easy, possibly there are some interdependencies etc. Also I saw mentions 
of some plans (e.g. [1],[2]) regarding the implementation of global code 
motion in LLVM, looks like there is already some work in progress.

Vadim

[1] 
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120709/146206.html
[2] 
http://markmail.org/message/2td3fnnggk6oripp#query:+page:1+mid:2td3fnnggk6oripp+state:results

> Christian.
>
>> Vadim
>>
>>  [1]
>> http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb/notes.markdown?h=r600-sb
>>
>>  [2]
>> http://www.cs.washington.edu/education/courses/cse501/06wi/reading/click-pldi95.pdf
>>
>>
>>> Regards,
>>> Christian.
>>
>>
>