[Mesa-dev] r600g: status of the r600-sb branch

Vadim Girlin vadimgirlin at gmail.com
Sat Apr 20 07:32:40 PDT 2013


On 04/20/2013 03:38 PM, Christian König wrote:
> Am 20.04.2013 13:12, schrieb Vadim Girlin:
>> On 04/20/2013 01:42 PM, Christian König wrote:
>>> Am 19.04.2013 18:50, schrieb Vadim Girlin:
>>>> On 04/19/2013 08:35 PM, Christian König wrote:
>>>>> Hey Vadim,
>>>>>
>>>>> Am 19.04.2013 18:18, schrieb Vadim Girlin:
>>>>>> [SNIP]
>>>>>>
>>>>>> In theory, yes, some optimizations in this branch are typically used
>>>>>> on the earlier compilation stages, not on the target machine code. On
>>>>>> the other hand, there are some differences that might make it harder,
>>>>>> e.g. many algorithms require SSA form, and though it's possible to do
>>>>>> similar optimizations without SSA, it would be hard to implement.
>>>>>> Also
>>>>>> I wanted to support both default backend and llvm backend for
>>>>>> increased testing coverage and to be able to compare the
>>>>>> efficiency of
>>>>>> the algorithms in my experiments etc.
>>>>>
>>>>> Yeah I know, missing an SSA implementation is also something that
>>>>> always
>>>>> bothered me a bit with both TGSI and GLSL (while I haven't done much
>>>>> with GLSL, so maybe I misjudge here).
>>>>>
>>>>> Can you name the different algorithms used?
>>>>
>>>> There is a short description of the algorithms and passes in the
>>>> notes.markdown file [1] in that branch, there are also links in the
>>>> end to the full description of some algorithms, though some of them
>>>> were modified/adapted for this branch.
>>>>
>>>>> It's not a strict prerequisite, but I think we both agree that doing
>>>>> things like LICM on R600 bytecode isn't the best idea over all (when
>>>>> doing it on GLSL would be beneficial for all drivers not only r600).
>>>>
>>>> In fact there is no special LICM pass, it's done by the GCM (Global
>>>> Code Motion, [2]), which probably could be also called global
>>>> scheduler. In fact in my branch this pass is combined with some
>>>> hw-specific scheduling logic, e.g. grouping fetch/alu instructions to
>>>> reduce clause type switching in the code and the number of required CF
>>>> instructions, potentially it can also schedule clauses to expose more
>>>> parallelism with the BARRIER bit usage.
>>>>
>>>
>>> Yeah I already thought that you're using something like this.
>>>
>>> On one hand that is really good, cause it is specialized on so produces
>>> really optimal code for the r600 target. But on the other hand it's bad,
>>> cause it is specialized on so produces really optimal code ONLY on the
>>> r600 target....
>>
>> I think such pass on higher level (GLSL IR or TGSI) would at least
>> need some callbacks or caps to be tunable for the target.
>>
>> Anyway the result of GCM pass is affected by the CFG structure, so
>> when the target applies e.g. if-conversion or any other
>> target-specific control flow optimization, this means that you might
>> want to apply similar pass again on the target instruction level for
>> better results, and then previous pass on higher level IR looks not
>> very useful.
>>
>> Also there are some high level operations that are translated to the
>> bunch of target instructions, e.g. integer division on r600.
>> High-level pass can't hoist "i/5" (where i is loop counter) out of the
>> loop, but after translation to target instructions it's possible to
>> hoist some of the resulting instructions, producing more efficient code.
>>
>> One more point is that GCM allows to achieve best efficiency when used
>> with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not
>> care about code placement during elimination of redundant operations,
>> so you'll probably want to implement high-level GVN pass as well.
>>
>> I think it's possible to implement GVN-GCM on GLSL or TGSI level, but
>> I suspect it will require a lot more efforts than it was required by
>> implementation of these passes in my branch, and will be less efficient.
>>
>>>
>>> Just speculating, what would it take to make those passes run on the
>>> LLVM Machine Instruction representation instead of your own
>>> representation?
>>
>> Main difference between IRs is the representation of control flow,
>> r600-sb relies on the fact that r600 arch doesn't have arbitrary
>> control flow, this renders CFGs superfluous. Implementation of these
>> passes on CFGs will be more complicated, it will also require the
>> computation of dominance frontiers, loops detection and analysis, etc.
>> On the r600-sb's IR these passes are greatly simplified.
>>
>> Regarding the GCM, original algorithm as described in that pdf works
>> on the CFG, so it shouldn't be hard to implement in LLVM, but I'm not
>> sure how it will fit into the LLVM infrastructure. LLVM has GVN-PRE,
>> LICM and other passes that together do basically the same thing as
>> GVN-GCM, so if you implement it, you might want to get rid of LLVM's
>> own passes that duplicate the same functionality, and I'm not sure if
>> this would be easy, possibly there are some interdependencies etc.
>> Also I saw mentions of some plans (e.g. [1],[2]) regarding the
>> implementation of global code motion in LLVM, looks like there is
>> already some work in progress.
>>
>
> Oh, I wasn't taking about replacing any LLVM passes, more like extending
> them to provide the same amount of functionality. Also I hadn't had LLVM
> IR in mind while writing this, but more the machine instruction
> representation they use.
>
> Well you have quite allot of C++ code here, and a big bunch of it just
> deals with bringing the bytecode into a representation where you can run
> your algorithms on it.

If you mean that I wouldn't need this code that brings everything into 
r600-sb's representation if I'd decide to implement everything in the 
LLVM backend, you are right.

On the other hand, then probably I would need either to implement more 
complicated and less efficient optimization passes on CFGs, or to 
collect the same information and construct something like r600-sb IR on 
top of the LLVM representation, both ways looked to me less efficient 
and more complicated by design, and would require more time.

Vadim

>
> Christian.
>
>> Vadim
>>
>> [1]
>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120709/146206.html
>>
>> [2]
>> http://markmail.org/message/2td3fnnggk6oripp#query:+page:1+mid:2td3fnnggk6oripp+state:results
>>
>>
>>
>>> Christian.
>>>
>>>> Vadim
>>>>
>>>>  [1]
>>>> http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb/notes.markdown?h=r600-sb
>>>>
>>>>
>>>>  [2]
>>>> http://www.cs.washington.edu/education/courses/cse501/06wi/reading/click-pldi95.pdf
>>>>
>>>>
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>
>>>>
>>>
>>
>>
>



More information about the mesa-dev mailing list