[Mesa-dev] Ideas on loop unrolling, loop examples, and my GSoC-blog

Thu May 28 12:19:26 PDT 2015

Hi everyone,

For those interested I will be writing a blog at:
https://hellthom.wordpress.com/
where I will be documenting my progress in GSoC (at least bi-weekly).
It also has some info about me in the "about" section.
If there is anything else you want to know, or think I should add, just holla.

Now for the actual reason for the mail. I'm seeking opinions and thoughts on the
loop unrolling part of my GSoC project.

First, to get started, are some examples of loops from my shader-db:

for (i = -fsize.x; i <= fsize.x; i++)
{{
color += texture2D(tex, tc + i * texscale.xy, 0.0);
}} // EndBox1/2.

for (i.x = -fsize.x; i.x <= fsize.x; i.x++)
{
for (i.y = -fsize.y; i.y <= fsize.y; i.y++)
{
color += texture2D(tex, tc + i * texscale.xy, 0.0);
}} // EndBox1/2.

for (int rep1 = 0; rep1 < ps_i0.x; rep1++) {
t1_ps.zw = (t0_ps.yy * t1_ps.xy) + t0_ps.xz;
t3_ps = texture2D(g_samScreenImage, t1_ps.zw);
t2_ps.xyz = t2_ps.xyz + t3_ps.xyz;
t0_ps.y = t0_ps.y + ps_c0.x;
}

for (int i = 0; i < LIGHT_COUNT; i++){
vec3 lightVec = invRadius[i] * (lightPos[i] - gl_Vertex.xyz);

#ifdef SHADOWS
shadowVec[i] = -lightVec;
#endif
lVec[i].x = dot(lightVec, tangent);
lVec[i].y = dot(lightVec, binormal);
lVec[i].z = dot(lightVec, normal);
}

for(i = 0.0, f = 1.0; i < ScaleSteps.w; ++i, f *= 0.5)
   RT += OffsetVector * (step(dp_textureGrad(Texture_Normal, RT.xy,
dPdx, dPdy).a, RT.z) * f - 0.5 * f);

for (int l_3 = 0; l_3 < 28; l_3++) {
    vec4 tmpvar_10;
    tmpvar_10.xy = vec2(1.2, 1.2);
    tmpvar_10.zw = DiscKernel[l_3].zz;
    vec4 tmpvar_11;
    tmpvar_11 = (tmpvar_1.xyxy + ((DiscKernel[l_3].xyxy *
poissonScale_5.xyxy) / tmpvar_10));
    vec4 tmpvar_12;
    tmpvar_12 = texture2D (_MainTex, tmpvar_11.xy);
    vec4 tmpvar_13;
    tmpvar_13 = texture2D (_MainTex, tmpvar_11.zw);
    if (((tmpvar_12.w + tmpvar_13.w) > 0.0)) {
      vec2 tmpvar_14;
      tmpvar_14.y = 1.0;
      tmpvar_14.x = (DiscKernel[l_3].z / 1.2);
      vec2 tmpvar_15;
      tmpvar_15.x = tmpvar_12.w;
      tmpvar_15.y = tmpvar_13.w;
      vec2 tmpvar_16;
      vec2 tmpvar_17;
      tmpvar_17 = clamp (((
        (tmpvar_15 - (centerTap_7.ww * tmpvar_14))
       - vec2(-0.265, -0.265)) / vec2(0.265, 0.265)), 0.0, 1.0);
      tmpvar_16 = (tmpvar_17 * (tmpvar_17 * (3.0 -
        (2.0 * tmpvar_17)
      )));
      sum_6 = (sum_6 + ((tmpvar_12 * tmpvar_16.x) + (tmpvar_13 * tmpvar_16.y)));
      sampleCount_4 = (sampleCount_4 + dot (tmpvar_16, vec2(1.0, 1.0)));
    };
  };

Most of the loops are simple; array defines or lookups with a for loop.
No magic at all. Usually only one or two assignments in the loop body.
Then there are the nested loops from dota 2,
and there are also some cases of loops that have large bodies.
I have however not managed to grep up any loops apart from for-loops.
Seems for-loops is the goto loops, at least for my collection of shaders.

Some of the ideas and plans I've made so far:

Split the work into two parts:
First I will write a loop analysis pass. I think it makes sense to separate
this out into its own file, as the information it can gather can be usefull.
Things that come to my mind are loop invariant code motion
and strength reduction passes. Then i will write the actual
unroll pass when the analysis pass is done.

Add some new data to the metadata system.
At the very least an is_invariant variable. This should track if the
ssa-value is loop invariant. To avoid the issue with nested loops a
ssa-variable marked as invariant is only invariant in the loop it is present.
They are invariant if the operands are all in the preceding block,
if the operands are invariant in the same loop as we are in,
or a combination of the two.

There should probably also be an is_induction_variable flag
in the metadata, as this will be shared between loop unrolling
and a possible strength reduction pass (and probably others too).
I need to think a bit more about this though.
There should probably be some kind of relation between basic
induction variables (int i) and derived variables.

On the subject of loop analysis, does anyone happen to know of
a good paper on induction variable detection in ssa?
Loop invariants are simple enough, but induction variables are
a bit harder. Luckily there seems to be only linear induction
variables as far as I can detect. The (int i = 0; i < max; i++)
pattern is recurring a lot, with the random case of variables
that all fit in the form i + i*a, where a is loop invariant.
So I don't think we need a to complicated algorithm.

For the case with nested loops I was thinking we might want to
go for unroll-and-jam. I think I might have managed to somehow
loose an important paper I had about nested loops, but there where
some other possibilities for handling nested loops that might be
worth taking a closer look at. I recall there being a better
performing solution than unroll-and-squash.
But this is a bit up and into the future still.

One more thing;
Is there a limit where the loop body gets so large that we
want to decide that "gah, this sucks, no point in unrolling this"?
I imagine as the loops get large there will be a case of
diminishing returns from the unrolling?

I will start sketching a solution now, so if there is any
objections/suggestions on my planned approach it would
be nice to get those out there before I get to much into it :)

Regards,
Thomas