[Mesa-dev] [PATCH 07/10] i965: Replace brw_wm_* with dumping code into the fs_visitor.

Thu Sep 27 17:27:31 PDT 2012

Kenneth Graunke <kenneth at whitecape.org> writes:

> On 09/22/2012 02:04 PM, Eric Anholt wrote:
>> This makes a giant pile of code newly dead.  It also fixes TXB on newer
>> chipsets, which has been totally broken (I now have a piglit test for that).
>> It passes the same set of Ian's ARB_fragment_program tests.  It also improves
>> high-settings ETQW performance by 3.2 +/- 1.9% (n=3), thanks to better
>> optimization and having 8-wide along with 16-wide shaders.
>> ---
>>  src/mesa/drivers/dri/i965/Makefile.sources   |    1 +
>>  src/mesa/drivers/dri/i965/brw_fs.cpp         |   36 +-
>>  src/mesa/drivers/dri/i965/brw_fs.h           |   30 +-
>>  src/mesa/drivers/dri/i965/brw_fs_emit.cpp    |   22 +-
>>  src/mesa/drivers/dri/i965/brw_fs_fp.cpp      |  781 ++++++++++++++++++++++++++
>>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |    3 +-
>>  src/mesa/drivers/dri/i965/brw_wm.c           |   58 +-
>>  src/mesa/drivers/dri/i965/brw_wm_state.c     |   19 +-
>>  src/mesa/drivers/dri/i965/gen6_wm_state.c    |    8 +-
>>  src/mesa/drivers/dri/i965/gen7_wm_state.c    |    8 +-
>>  10 files changed, 857 insertions(+), 109 deletions(-)
>>  create mode 100644 src/mesa/drivers/dri/i965/brw_fs_fp.cpp
>
> I think the LIT code may be broken (comments inline), and one comment is
> wrong.  Assuming you fix (or refute) those, then patches 1-8 are:
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
>
> I haven't read through 9 and 10 yet, but I plan to soon.

>> +void
>> +fs_visitor::emit_fragment_program_code()
>> +{
>> +   setup_fp_regs();
>> +
>> +   fs_reg null = fs_reg(brw_null_reg());
>> +
>> +   /* Keep a reg with 0.0 around, for reuse use by emit_sop so that it can
>
> "Keep a reg with 1.0 around, for reuse by emit_fp_sop"
>                  ^^^ (not 0.0)                 ^^ (function name)

fixed

>> +      case OPCODE_DP2:
>> +      case OPCODE_DP3:
>> +      case OPCODE_DP4:
>> +      case OPCODE_DPH: {
>> +         fs_reg mul = fs_reg(this, glsl_type::float_type);
>> +         fs_reg acc = fs_reg(this, glsl_type::float_type);
>> +         int count;
>> +
>> +         switch (fpi->Opcode) {
>> +         case OPCODE_DP2: count = 2; break;
>> +         case OPCODE_DP3: count = 3; break;
>> +         case OPCODE_DP4: count = 4; break;
>> +         case OPCODE_DPH: count = 3; break;
>> +         default: assert(!"not reached"); count = 0; break;
>> +         }
>> +
>> +         emit(BRW_OPCODE_MUL, acc,
>> +              regoffset(src[0], 0), regoffset(src[1], 0));
>> +         for (int i = 1; i < count; i++) {
>> +            emit(BRW_OPCODE_MUL, mul,
>> +                 regoffset(src[0], i), regoffset(src[1], i));
>> +            emit(BRW_OPCODE_ADD, acc, acc, mul);
>> +         }
>
> Future optimization: MAD would be nice here, but that can be done later.

Yeah, or even MACs.  This is a codegen quality regression from
brw_wm_*.c.

>> +      case OPCODE_LIT:
>> +         /* From the ARB_fragment_program spec:
>> +          *
>> +          *      tmp = VectorLoad(op0);
>> +          *      if (tmp.x < 0) tmp.x = 0;
>> +          *      if (tmp.y < 0) tmp.y = 0;
>> +          *      if (tmp.w < -(128.0-epsilon)) tmp.w = -(128.0-epsilon);
>> +          *      else if (tmp.w > 128-epsilon) tmp.w = 128-epsilon;
>> +          *      result.x = 1.0;
>> +          *      result.y = tmp.x;
>> +          *      result.z = (tmp.x > 0) ? RoughApproxPower(tmp.y, tmp.w) : 0.0;
>> +          *      result.w = 1.0;
>> +          */
>> +         if (fpi->DstReg.WriteMask & WRITEMASK_X)
>> +            emit(BRW_OPCODE_MOV, regoffset(dst, 0), fs_reg(1.0f));
>> +
>> +         if (fpi->DstReg.WriteMask & WRITEMASK_YZ) {
>> +            fs_inst *inst;
>> +            inst = emit(BRW_OPCODE_CMP, null,
>> +                        regoffset(src[0], 0), fs_reg(0.0f));
>> +            inst->conditional_mod = BRW_CONDITIONAL_LE;
>> +
>> +            if (fpi->DstReg.WriteMask & WRITEMASK_Y) {
>> +               emit(BRW_OPCODE_MOV, regoffset(dst, 1), regoffset(src[0], 0));
>> +               inst = emit(BRW_OPCODE_MOV, regoffset(dst, 1), fs_reg(0.0f));
>> +               inst->predicated = true;
>> +            }
>> +
>> +            if (fpi->DstReg.WriteMask & WRITEMASK_Z) {
>> +               emit_math(SHADER_OPCODE_POW, regoffset(dst, 2),
>> +                         regoffset(src[0], 1), regoffset(src[0], 3));
>> +
>> +               inst = emit(BRW_OPCODE_MOV, regoffset(dst, 2), fs_reg(0.0f));
>> +               inst->predicated = true;
>
> This looks broken...don't you need to handle clamping to (-128, 128)?

Ah, I lifted this code directly from the former backend.  I'll put in a
note that I didn't change behavior.

I don't know of any use for the -128,128 clamping.  I think it's just a
spec artifact from this extension being written for a particular
hardware instruction set.  We should probably fix it for thoroughness,
but I don't expect app behavior to change.  Do you want to see that done
in this series?

>> +      case OPCODE_SCS:
>> +         if (fpi->DstReg.WriteMask & WRITEMASK_X) {
>> +            emit_math(SHADER_OPCODE_COS, regoffset(dst, 0),
>> +                      regoffset(src[0], 0));
>> +         }
>> +
>> +         if (fpi->DstReg.WriteMask & WRITEMASK_Y) {
>> +            emit_math(SHADER_OPCODE_SIN, regoffset(dst, 1),
>> +                      regoffset(src[0], 1));
>> +         }
>> +         break;
>
> Future optimization: we could use the actual SINCOS math instruction
> when asking for WRITEMASK_XY.  But I don't know how common that is.

I haven't seen one yet.  If we did, given that GLSL doesn't have a
sincos, we probably would want to do it as a peephole optimization.

(I have seen sincos-applicable shaders on the vertex side, though.  Not
well-written ones, just examples)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20120927/0a24e18a/attachment.pgp>