[Mesa-dev] Backends and support for pow-instructions

Fri Jul 18 15:04:38 PDT 2014

On Wed, Jul 16, 2014 at 4:14 PM, Thomas Helland
<thomashelland90 at gmail.com> wrote:
> 2014-07-13 20:13 GMT+02:00 Matt Turner <mattst88 at gmail.com>:
>>
>> On Sun, Jul 13, 2014 at 10:50 AM, Thomas Helland
>> <thomashelland90 at gmail.com> wrote:
>> > I've considered writing an algebraic optimization to convert
>> > this into an ir_binop_pow. If my understanding is correct the backend
>> > will then implement this in a similar fashion as above if it does not
>> > have a native pow() instruction.
>> >
>> > If, on the other hand, we have a pow() instruction, my guess is
>> > we'd see reduced instruction-counts.
>> >
>> > Is my understanding correct? Is this something that's worth doing?
>>
>> Yes and yes :)
>>
>> It's something I've thought about doing for a while. The only hang-up
>> is that we don't get nice expression trees to match in opt_algebraic.
>> Ideally, we'd get an ir_instruction with an rvalue that looked like
>>
>> (assign (xyz) (var_ref r3) (expression vec3 log2 (expression vec3 *
>> (expression vec3 exp2 (swiz xyz (var_ref r3))) (constant vec3
>> (2.200000 2.200000 2.200000)))))
>>
>> and then the bit of code in opt_algebraic is simple. Unfortunately, r3
>> is likely a vec4 and is used repeatedly throughout the shader for many
>> unrelated things. If we were able to split up these variables (i.e.,
>> recognize that the use of r3 for log2/mul/exp2 is a distinct live
>> range from the other uses of r3, and give it a new variable name) then
>> tree grafting would be able to give us the expression tree that we
>> want.
>>
>
> So we would probably be helped with a UD-chain, and a pass to
> make new variables for each of the new definitions?
> As far as I've managed to aclimate to the code-base we
> do not have such a feature yet in the glsl-compiler?

Right. UD chains would probably help a lot in solving this problem.

>> That would let a lot of existing optimization passes perform better as well.
>>
>> Ken and I worked on this kind of pass in the i965 backend [0]. It
>> looked for full register writes outside of control flow, assigned the
>> result to a new register, and rewrote future uses of the old with the
>> new register. Something like that at the GLSL IR level would do the
>> trick. One problem to solve is how to handle partial writes of
>> variables, since in the case you brought up the shader only uses 3
>> components of a vec4, but they're still a distinct live range.
>>
>
> I guess we would need to keep track of the uses and defs for
> each component in the vector, some kind of fancy UD-chain
> that works component-wise, and also globally on the vector.
>
> I accidentally stumbled across some work in Eric's git-repo that
> looks pretty useful as a basis for how to go about this. [1]
> It seems to implement live-variable analysis that are both
> control-flow and swizzle-aware, and works component-wise.
> I have only given it a short glimpse, but seems promising.

I hadn't considered using that code, but yeah, that would probably be
really helpful.