[Mesa-dev] [PATCH 07/10] i965: Replace brw_wm_* with dumping code into the fs_visitor.
Eric Anholt
eric at anholt.net
Thu Sep 27 17:27:31 PDT 2012
Kenneth Graunke <kenneth at whitecape.org> writes:
> On 09/22/2012 02:04 PM, Eric Anholt wrote:
>> This makes a giant pile of code newly dead. It also fixes TXB on newer
>> chipsets, which has been totally broken (I now have a piglit test for that).
>> It passes the same set of Ian's ARB_fragment_program tests. It also improves
>> high-settings ETQW performance by 3.2 +/- 1.9% (n=3), thanks to better
>> optimization and having 8-wide along with 16-wide shaders.
>> ---
>> src/mesa/drivers/dri/i965/Makefile.sources | 1 +
>> src/mesa/drivers/dri/i965/brw_fs.cpp | 36 +-
>> src/mesa/drivers/dri/i965/brw_fs.h | 30 +-
>> src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 22 +-
>> src/mesa/drivers/dri/i965/brw_fs_fp.cpp | 781 ++++++++++++++++++++++++++
>> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 +-
>> src/mesa/drivers/dri/i965/brw_wm.c | 58 +-
>> src/mesa/drivers/dri/i965/brw_wm_state.c | 19 +-
>> src/mesa/drivers/dri/i965/gen6_wm_state.c | 8 +-
>> src/mesa/drivers/dri/i965/gen7_wm_state.c | 8 +-
>> 10 files changed, 857 insertions(+), 109 deletions(-)
>> create mode 100644 src/mesa/drivers/dri/i965/brw_fs_fp.cpp
>
> I think the LIT code may be broken (comments inline), and one comment is
> wrong. Assuming you fix (or refute) those, then patches 1-8 are:
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
>
> I haven't read through 9 and 10 yet, but I plan to soon.
>> +void
>> +fs_visitor::emit_fragment_program_code()
>> +{
>> + setup_fp_regs();
>> +
>> + fs_reg null = fs_reg(brw_null_reg());
>> +
>> + /* Keep a reg with 0.0 around, for reuse use by emit_sop so that it can
>
> "Keep a reg with 1.0 around, for reuse by emit_fp_sop"
> ^^^ (not 0.0) ^^ (function name)
fixed
>> + case OPCODE_DP2:
>> + case OPCODE_DP3:
>> + case OPCODE_DP4:
>> + case OPCODE_DPH: {
>> + fs_reg mul = fs_reg(this, glsl_type::float_type);
>> + fs_reg acc = fs_reg(this, glsl_type::float_type);
>> + int count;
>> +
>> + switch (fpi->Opcode) {
>> + case OPCODE_DP2: count = 2; break;
>> + case OPCODE_DP3: count = 3; break;
>> + case OPCODE_DP4: count = 4; break;
>> + case OPCODE_DPH: count = 3; break;
>> + default: assert(!"not reached"); count = 0; break;
>> + }
>> +
>> + emit(BRW_OPCODE_MUL, acc,
>> + regoffset(src[0], 0), regoffset(src[1], 0));
>> + for (int i = 1; i < count; i++) {
>> + emit(BRW_OPCODE_MUL, mul,
>> + regoffset(src[0], i), regoffset(src[1], i));
>> + emit(BRW_OPCODE_ADD, acc, acc, mul);
>> + }
>
> Future optimization: MAD would be nice here, but that can be done later.
Yeah, or even MACs. This is a codegen quality regression from
brw_wm_*.c.
>> + case OPCODE_LIT:
>> + /* From the ARB_fragment_program spec:
>> + *
>> + * tmp = VectorLoad(op0);
>> + * if (tmp.x < 0) tmp.x = 0;
>> + * if (tmp.y < 0) tmp.y = 0;
>> + * if (tmp.w < -(128.0-epsilon)) tmp.w = -(128.0-epsilon);
>> + * else if (tmp.w > 128-epsilon) tmp.w = 128-epsilon;
>> + * result.x = 1.0;
>> + * result.y = tmp.x;
>> + * result.z = (tmp.x > 0) ? RoughApproxPower(tmp.y, tmp.w) : 0.0;
>> + * result.w = 1.0;
>> + */
>> + if (fpi->DstReg.WriteMask & WRITEMASK_X)
>> + emit(BRW_OPCODE_MOV, regoffset(dst, 0), fs_reg(1.0f));
>> +
>> + if (fpi->DstReg.WriteMask & WRITEMASK_YZ) {
>> + fs_inst *inst;
>> + inst = emit(BRW_OPCODE_CMP, null,
>> + regoffset(src[0], 0), fs_reg(0.0f));
>> + inst->conditional_mod = BRW_CONDITIONAL_LE;
>> +
>> + if (fpi->DstReg.WriteMask & WRITEMASK_Y) {
>> + emit(BRW_OPCODE_MOV, regoffset(dst, 1), regoffset(src[0], 0));
>> + inst = emit(BRW_OPCODE_MOV, regoffset(dst, 1), fs_reg(0.0f));
>> + inst->predicated = true;
>> + }
>> +
>> + if (fpi->DstReg.WriteMask & WRITEMASK_Z) {
>> + emit_math(SHADER_OPCODE_POW, regoffset(dst, 2),
>> + regoffset(src[0], 1), regoffset(src[0], 3));
>> +
>> + inst = emit(BRW_OPCODE_MOV, regoffset(dst, 2), fs_reg(0.0f));
>> + inst->predicated = true;
>
> This looks broken...don't you need to handle clamping to (-128, 128)?
Ah, I lifted this code directly from the former backend. I'll put in a
note that I didn't change behavior.
I don't know of any use for the -128,128 clamping. I think it's just a
spec artifact from this extension being written for a particular
hardware instruction set. We should probably fix it for thoroughness,
but I don't expect app behavior to change. Do you want to see that done
in this series?
>> + case OPCODE_SCS:
>> + if (fpi->DstReg.WriteMask & WRITEMASK_X) {
>> + emit_math(SHADER_OPCODE_COS, regoffset(dst, 0),
>> + regoffset(src[0], 0));
>> + }
>> +
>> + if (fpi->DstReg.WriteMask & WRITEMASK_Y) {
>> + emit_math(SHADER_OPCODE_SIN, regoffset(dst, 1),
>> + regoffset(src[0], 1));
>> + }
>> + break;
>
> Future optimization: we could use the actual SINCOS math instruction
> when asking for WRITEMASK_XY. But I don't know how common that is.
I haven't seen one yet. If we did, given that GLSL doesn't have a
sincos, we probably would want to do it as a peephole optimization.
(I have seen sincos-applicable shaders on the vertex side, though. Not
well-written ones, just examples)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20120927/0a24e18a/attachment.pgp>
More information about the mesa-dev
mailing list