[Mesa-dev] [PATCH 04/10] i965/vs: Replace brw_vs_emit.c with dumping code into the vec4_visitor.

Fri Oct 26 15:56:50 PDT 2012

Kenneth Graunke <kenneth at whitecape.org> writes:

> Rather than having two separate backends, just create a small layer that
> translates the subset of Mesa IR used for ARB_vertex_program and fixed
> function programs to the Vec4 IR.  This allows us to use the same
> optimization passes, code generator, register allocator as for GLSL.

> +   if (shader) {
> +      move_grf_array_access_to_scratch();
> +      move_uniform_array_access_to_pull_constants();
> +   } else {
> +      /* The ARB_vertex_program frontend emits pull constant loads directly
> +       * rather than using reladdr, so we don't need to walk through all the
> +       * instructions looking for things to move.  There isn't anything.
> +       *
> +       * We do still need to split things to vec4 size.
> +       */
> +      split_uniform_registers();

This will blow up if you have a more-than-push-constants-sized pile of
non-constant but constant-indexed params I think.

> + * Implementation of the compiler for ARB_vertex_program, NV_vertex_program,
> + * and NV_vertex_program1_1 shaders on top of the GLSL compiler backend.
> + */

No more NV_vertex_program.  Woo!

> +      case OPCODE_EXP: {
> +         dst_reg t = dst;
> +         if (vpi->DstReg.WriteMask & WRITEMASK_X) {
> +            src_reg floor = src_reg(this, glsl_type::float_type);
> +            emit(RNDD(dst_reg(floor), src[0]));
> +            t.writemask = WRITEMASK_X;
> +            #warning "EXP code regression: use SHL, not EXP2 for .x"

The #warnings would need to change before landing.

> +            emit_math(SHADER_OPCODE_EXP2, t, floor);
> +         }

> +      case OPCODE_LIT: {
> +         dst_reg result = dst;
> +         /* From the ARB_vertex_program spec:
> +          *
> +          *      tmp = VectorLoad(op0);
> +          *      if (tmp.x < 0) tmp.x = 0;
> +          *      if (tmp.y < 0) tmp.y = 0;
> +          *      if (tmp.w < -(128.0-epsilon)) tmp.w = -(128.0-epsilon);
> +          *      else if (tmp.w > 128-epsilon) tmp.w = 128-epsilon;
> +          *      result.x = 1.0;
> +          *      result.y = tmp.x;
> +          *      result.z = (tmp.x > 0) ? RoughApproxPower(tmp.y, tmp.w) : 0.0;
> +          *      result.w = 1.0;
> +          *
> +          * Note that we don't do the clamping to +/- 128.  We didn't in
> +          * brw_vs_emit.c either.
> +          */
> +         if (vpi->DstReg.WriteMask & WRITEMASK_XW) {
> +            result.writemask = WRITEMASK_XW;
> +            emit(MOV(result, src_reg(1.0f)));
> +         }
> +         if (vpi->DstReg.WriteMask & WRITEMASK_YZ) {
> +            result.writemask = WRITEMASK_YZ;
> +            emit(MOV(result, src_reg(0.0f)));
> +
> +            src_reg tmp_x = src[0];
> +            tmp_x.swizzle = BRW_SWIZZLE_XXXX;

This should be a reswizzle of tmp_x.swizzle I think, and similarly below.

> +      case OPCODE_MAD: {
> +         #warning "code regression: need MAC"
> +         src_reg temp = src_reg(this, glsl_type::vec4_type);
> +         emit(MUL(dst_reg(temp), src[0], src[1]));
> +         emit(ADD(dst, temp, src[2]));
> +         break;
> +      }

I really want to do mad/mac as a post-visit peephole process in both vs
and fs I think.  With that and just another step or two, we'd be able to
dump tree grafting.

> +      // probably not true, but I need to update this code when it breaks

Inconsistent // comments with the rest of our code

> +   if (src.Swizzle != SWIZZLE_NOOP || src.Negate) {
> +      unsigned short zeros_mask = 0;
> +      unsigned short ones_mask = 0;
> +      unsigned short src_mask = 0;
> +      unsigned short src_swiz[4];
> +
> +      for (int i = 0; i < 4; i++) {
> +         /* The ZERO, ONE, and Negate options are only used for OPCODE_SWZ,
> +          * but it's simplest to handle it here.
> +          */
> +         int s = GET_SWZ(src.Swizzle, i);
> +         switch (s) {
> +         case SWIZZLE_X:
> +         case SWIZZLE_Y:
> +         case SWIZZLE_Z:
> +         case SWIZZLE_W:
> +            src_mask |= 1 << i;
> +            src_swiz[i] = s;
> +            break;
> +         case SWIZZLE_ZERO:
> +            zeros_mask |= 1 << i;
> +            break;
> +         case SWIZZLE_ONE:
> +            ones_mask |= 1 << i;
> +            break;
> +         }
> +      }
> +
> +      result.swizzle =
> +         BRW_SWIZZLE4(src_swiz[0], src_swiz[1], src_swiz[2], src_swiz[3]);

Use of uninitialized src_swiz[] values on the stack in the ZERO/ONE
case.  If they happened to be >3, I think things would go badly as the
fields overflow.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20121026/aedfc034/attachment.pgp>