[Mesa-dev] [PATCH 01/38] i965/fs: Introduce FS IR builder.
Matt Turner
mattst88 at gmail.com
Thu Jun 4 10:20:29 PDT 2015
On Thu, Jun 4, 2015 at 9:04 AM, Francisco Jerez <currojerez at riseup.net> wrote:
> The purpose of this change is threefold: First, it improves the
> modularity of the compiler back-end by separating the functionality
> required to construct an i965 IR program from the rest of the visitor
> god-object, what in turn will reduce the coupling between other
> components and the visitor allowing a more modular design. This patch
> doesn't yet remove the equivalent functionality from the visitor
> classes, that task will be undertaken by a separate series, as it
> involves major back-end surgery.
>
> Second, it improves consistency between the scalar and vector
> back-ends. The FS and VEC4 builders can both be used to generate
> scalar code with a compatible interface or they can be used to
> generate natural vector width code -- 1 or 4 components respectively.
>
> Third, the approach to IR construction is somewhat different to what
> the visitor classes currently do. All parameters affecting code
> generation (execution size, half control, point in the program where
> new instructions are inserted, etc.) are encapsulated in a stand-alone
> object rather than being quasi-global state (yes, anything defined in
> one of the visitor classes is effectively global due to the tight
> coupling with virtually everything else in the compiler back-end).
> This object is lightweight and can be copied, mutated and passed
> around, making helper IR-building functions more flexible because they
> can now simply take a builder object as argument and will inherit its
> IR generation properties in exactly the same way that a discrete
> instruction would from the same builder object.
>
> The emit_typed_write() function from my image-load-store branch is an
> example that illustrates the usefulness of the latter point: Due to
> hardware limitations the function may have to split the untyped
> surface message in 8-wide chunks. That means that the several
> functions called to help with the construction of the message payload
> are themselves required to set the execution width and half control
> correctly on the instructions they emit, and to allocate all registers
> with half the default width. With the previous approach this would
> require the used helper functions to be aware of the parameters that
> might differ from the default state and explicitly set the instruction
> bits accordingly. With the new approach they would get a modified
> builder object as argument that would influence all instructions
> emitted by the helper function as if it were the default state.
>
> Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
> method. It doesn't actually emit any instructions, they are simply
> created and inserted into an exec_list which is returned for the
> caller to emit at some location of the program. This sort of two-step
> emission becomes unnecessary with the builder interface because the
> insertion point is one more of the code generation parameters which
> are part of the builder object. The caller can simply pass
> VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
> location of the program where the effect of the constant load is
> desired. This two-step emission (which pervades the compiler back-end
> and is in most cases redundant) goes away: E.g. ADD() now actually
> adds two registers rather than just creating an ADD instruction in
> memory, emit(ADD()) is no longer necessary.
>
> v2: Drop scalarizing VEC4 builder.
> v3: Take a backend_shader as constructor argument. Improve handling
> of debug annotations and execution control flags.
> ---
> src/mesa/drivers/dri/i965/Makefile.sources | 1 +
> src/mesa/drivers/dri/i965/brw_fs_builder.h | 659 +++++++++++++++++++++++++++++
> 2 files changed, 660 insertions(+)
> create mode 100644 src/mesa/drivers/dri/i965/brw_fs_builder.h
>
> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources
> index 3f852cd..93f336e 100644
> --- a/src/mesa/drivers/dri/i965/Makefile.sources
> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
> @@ -42,6 +42,7 @@ i965_FILES = \
> brw_ff_gs.c \
> brw_ff_gs_emit.c \
> brw_ff_gs.h \
> + brw_fs_builder.h \
> brw_fs_channel_expressions.cpp \
> brw_fs_cmod_propagation.cpp \
> brw_fs_combine_constants.cpp \
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> new file mode 100644
> index 0000000..e528180
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> @@ -0,0 +1,659 @@
> +/* -*- c++ -*- */
> +/*
> + * Copyright © 2010-2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#ifndef BRW_FS_BUILDER_H
> +#define BRW_FS_BUILDER_H
> +
> +#include "brw_ir_fs.h"
> +#include "brw_shader.h"
> +#include "brw_context.h"
> +
> +namespace brw {
> + /**
> + * Toolbox to assemble an FS IR program out of individual instructions.
> + *
> + * This object is meant to have an interface consistent with
> + * brw::vec4_builder. They cannot be fully interchangeable because
> + * brw::fs_builder generates scalar code while brw::vec4_builder generates
> + * vector code.
> + */
> + class fs_builder {
> + public:
> + /** Type used in this IR to represent a source of an instruction. */
> + typedef fs_reg src_reg;
> +
> + /** Type used in this IR to represent the destination of an instruction. */
> + typedef fs_reg dst_reg;
> +
> + /** Type used in this IR to represent an instruction. */
> + typedef fs_inst instruction;
> +
> + /**
> + * Construct an fs_builder appending instructions at the end of
> + * the instruction list of \p shader. \p dispatch_width gives
> + * the native execution width of the program.
> + */
> + fs_builder(backend_shader *shader,
> + unsigned dispatch_width) :
> + shader(shader), block(NULL),
> + cursor((exec_node *)&shader->instructions.tail),
> + _dispatch_width(dispatch_width),
> + _group(0),
> + force_writemask_all(false),
> + annotation()
> + {
> + }
> +
> + /**
> + * Construct an fs_builder that inserts instructions before \p cursor in
> + * basic block \p block, inheriting other code generation parameters
> + * from this.
> + */
> + fs_builder
> + at(bblock_t *block, exec_node *cursor) const
> + {
> + fs_builder bld = *this;
> + bld.block = block;
> + bld.cursor = cursor;
> + return bld;
> + }
> +
> + /**
> + * Construct a builder specifying the default SIMD width and group of
> + * channel enable signals, inheriting other code generation parameters
> + * from this.
> + *
> + * \p n gives the default SIMD width, \p i gives the slot group used for
> + * predication and control flow masking in multiples of \p n channels.
> + */
> + fs_builder
> + group(unsigned n, unsigned i) const
> + {
> + assert(n <= dispatch_width() &&
> + i < dispatch_width() / n);
> + fs_builder bld = *this;
> + bld._dispatch_width = n;
> + bld._group += i * n;
> + return bld;
> + }
> +
> + /**
> + * Alias for group() with width equal to eight.
> + */
> + fs_builder
> + half(unsigned i) const
> + {
> + return group(8, i);
> + }
> +
> + /**
> + * Construct a builder with per-channel control flow execution masking
> + * disabled if \p b is true. If control flow execution masking is
> + * already disabled this has no effect.
> + */
> + fs_builder
> + exec_all(bool b = true) const
> + {
> + fs_builder bld = *this;
> + if (b)
> + bld.force_writemask_all = true;
> + return bld;
> + }
> +
> + /**
> + * Construct a builder with the given debug annotation info.
> + */
> + fs_builder
> + annotate(const char *str, const void *ir = NULL) const
> + {
> + fs_builder bld = *this;
> + bld.annotation.str = str;
> + bld.annotation.ir = ir;
> + return bld;
> + }
> +
> + /**
> + * Get the SIMD width in use.
> + */
> + unsigned
> + dispatch_width() const
> + {
> + return _dispatch_width;
> + }
> +
> + /**
> + * Allocate a virtual register of natural vector size (one for this IR)
> + * and SIMD width. \p n gives the amount of space to allocate in
> + * dispatch_width units (which is just enough space for one logical
> + * component in this IR).
> + */
> + dst_reg
> + vgrf(enum brw_reg_type type, unsigned n = 1) const
> + {
> + return dst_reg(GRF, shader->alloc.allocate(
> + DIV_ROUND_UP(n * type_sz(type) * dispatch_width(),
> + REG_SIZE)),
> + type, dispatch_width());
> + }
> +
> + /**
> + * Create a null register of floating type.
> + */
> + dst_reg
> + null_reg_f() const
> + {
> + return dst_reg(retype(brw_null_vec(dispatch_width()),
> + BRW_REGISTER_TYPE_F));
> + }
> +
> + /**
> + * Create a null register of signed integer type.
> + */
> + dst_reg
> + null_reg_d() const
> + {
> + return dst_reg(retype(brw_null_vec(dispatch_width()),
> + BRW_REGISTER_TYPE_D));
> + }
> +
> + /**
> + * Create a null register of unsigned integer type.
> + */
> + dst_reg
> + null_reg_ud() const
> + {
> + return dst_reg(retype(brw_null_vec(dispatch_width()),
> + BRW_REGISTER_TYPE_UD));
> + }
> +
> + /**
> + * Get the mask of SIMD channels enabled by dispatch and not yet
> + * disabled by discard.
> + */
> + src_reg
> + sample_mask_reg() const
> + {
> + const bool uses_kill =
> + (shader->stage == MESA_SHADER_FRAGMENT &&
> + ((brw_wm_prog_data *)shader->stage_prog_data)->uses_kill);
> + return (shader->stage != MESA_SHADER_FRAGMENT ? src_reg(0xffff) :
> + uses_kill ? brw_flag_reg(0, 1) :
> + retype(brw_vec1_grf(1, 7), BRW_REGISTER_TYPE_UD));
> + }
> +
> + /**
> + * Insert an instruction into the program.
> + */
> + instruction *
> + emit(const instruction &inst) const
> + {
> + return emit(new(shader->mem_ctx) instruction(inst));
> + }
> +
> + /**
> + * Create and insert a nullary control instruction into the program.
> + */
> + instruction *
> + emit(enum opcode opcode) const
> + {
> + return emit(instruction(opcode, dispatch_width()));
> + }
> +
> + /**
> + * Create and insert a nullary instruction into the program.
> + */
> + instruction *
> + emit(enum opcode opcode, const dst_reg &dst) const
> + {
> + return emit(instruction(opcode, dst));
> + }
> +
> + /**
> + * Create and insert a unary instruction into the program.
> + */
> + instruction *
> + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0) const
> + {
> + switch (opcode) {
> + case SHADER_OPCODE_RCP:
> + case SHADER_OPCODE_RSQ:
> + case SHADER_OPCODE_SQRT:
> + case SHADER_OPCODE_EXP2:
> + case SHADER_OPCODE_LOG2:
> + case SHADER_OPCODE_SIN:
> + case SHADER_OPCODE_COS:
> + return fix_math_instruction(
> + emit(instruction(opcode, dst.width, dst,
> + fix_math_operand(src0))));
> +
> + default:
> + return emit(instruction(opcode, dst.width, dst, src0));
> + }
> + }
> +
> + /**
> + * Create and insert a binary instruction into the program.
> + */
> + instruction *
> + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0,
> + const src_reg &src1) const
> + {
> + switch (opcode) {
> + case SHADER_OPCODE_POW:
> + case SHADER_OPCODE_INT_QUOTIENT:
> + case SHADER_OPCODE_INT_REMAINDER:
> + return fix_math_instruction(
> + emit(instruction(opcode, dst.width, dst,
> + fix_math_operand(src0),
> + fix_math_operand(src1))));
> +
> + default:
> + return emit(instruction(opcode, dst.width, dst, src0, src1));
> +
> + }
> + }
> +
> + /**
> + * Create and insert a ternary instruction into the program.
> + */
> + instruction *
> + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0,
> + const src_reg &src1, const src_reg &src2) const
> + {
> + switch (opcode) {
> + case BRW_OPCODE_BFE:
> + case BRW_OPCODE_BFI2:
> + case BRW_OPCODE_MAD:
> + case BRW_OPCODE_LRP:
> + return emit(instruction(opcode, dst.width, dst,
> + fix_3src_operand(src0),
> + fix_3src_operand(src1),
> + fix_3src_operand(src2)));
> +
> + default:
> + return emit(instruction(opcode, dst.width, dst, src0, src1, src2));
> + }
> + }
> +
> + /**
> + * Insert a preallocated instruction into the program.
> + */
> + instruction *
> + emit(instruction *inst) const
> + {
> + assert(inst->exec_size == dispatch_width() ||
> + force_writemask_all);
> + assert(_group == 0 || _group == 8);
> +
> + inst->force_sechalf = (_group == 8);
> + inst->force_writemask_all = force_writemask_all;
> + inst->annotation = annotation.str;
> + inst->ir = annotation.ir;
> +
> + if (block)
> + static_cast<instruction *>(cursor)->insert_before(block, inst);
> + else
> + cursor->insert_before(inst);
> +
> + return inst;
> + }
> +
> + /**
> + * Select \p src0 if the comparison of both sources with the given
> + * conditional mod evaluates to true, otherwise select \p src1.
> + *
> + * Generally useful to get the minimum or maximum of two values.
> + */
> + void
> + emit_minmax(const dst_reg &dst, const src_reg &src0,
> + const src_reg &src1, brw_conditional_mod mod) const
This is unused, after Jason's commit:
commit 78644ffc4d341deb431145108f0b2d377e59b61e
Author: Jason Ekstrand <jason.ekstrand at intel.com>
Date: Wed May 20 10:35:34 2015 -0700
i965/fs: Remove the ir_visitor code
Now that everything is running through NIR, this is all dead.
Reviewed-by: Matt Turner <mattst88 at gmail.com>
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
> + {
> + if (shader->devinfo->gen >= 6) {
> + set_condmod(mod, SEL(dst, fix_unsigned_negate(src0),
> + fix_unsigned_negate(src1)));
> + } else {
> + CMP(null_reg_d(), src0, src1, mod);
> + set_predicate(BRW_PREDICATE_NORMAL,
> + SEL(dst, src0, src1));
> + }
> + }
> +
> + /**
> + * Copy any live channel from \p src to the first channel of \p dst.
> + */
> + void
> + emit_uniformize(const dst_reg &dst, const src_reg &src) const
> + {
> + const fs_builder ubld = exec_all();
> + const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD);
> +
> + ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, component(chan_index, 0));
> + ubld.emit(SHADER_OPCODE_BROADCAST, component(dst, 0),
> + src, component(chan_index, 0));
> + }
> +
> + /**
> + * Assorted arithmetic ops.
> + * @{
> + */
> +#define ALU1(op) \
> + instruction * \
> + op(const dst_reg &dst, const src_reg &src0) const \
> + { \
> + return emit(BRW_OPCODE_##op, dst, src0); \
> + }
> +
> +#define ALU2(op) \
> + instruction * \
> + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1) const \
> + { \
> + return emit(BRW_OPCODE_##op, dst, src0, src1); \
> + }
> +
> +#define ALU2_ACC(op) \
> + instruction * \
> + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1) const \
> + { \
> + instruction *inst = emit(BRW_OPCODE_##op, dst, src0, src1); \
> + inst->writes_accumulator = true; \
> + return inst; \
> + }
> +
> +#define ALU3(op) \
> + instruction * \
> + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1, \
> + const src_reg &src2) const \
> + { \
> + return emit(BRW_OPCODE_##op, dst, src0, src1, src2); \
> + }
> +
> + ALU2(ADD)
> + ALU2_ACC(ADDC)
> + ALU2(AND)
> + ALU2(ASR)
> + ALU2(AVG)
> + ALU3(BFE)
> + ALU2(BFI1)
> + ALU3(BFI2)
> + ALU1(BFREV)
> + ALU1(CBIT)
> + ALU2(CMPN)
> + ALU3(CSEL)
> + ALU2(DP2)
> + ALU2(DP3)
> + ALU2(DP4)
> + ALU2(DPH)
> + ALU1(F16TO32)
> + ALU1(F32TO16)
> + ALU1(FBH)
> + ALU1(FBL)
> + ALU1(FRC)
> + ALU2(LINE)
> + ALU1(LZD)
> + ALU2(MAC)
> + ALU2_ACC(MACH)
> + ALU3(MAD)
> + ALU1(MOV)
> + ALU2(MUL)
> + ALU1(NOT)
> + ALU2(OR)
> + ALU2(PLN)
> + ALU1(RNDD)
> + ALU1(RNDE)
> + ALU1(RNDU)
> + ALU1(RNDZ)
> + ALU2(SAD2)
> + ALU2_ACC(SADA2)
> + ALU2(SEL)
> + ALU2(SHL)
> + ALU2(SHR)
> + ALU2_ACC(SUBB)
> + ALU2(XOR)
> +
> +#undef ALU3
> +#undef ALU2_ACC
> +#undef ALU2
> +#undef ALU1
> + /** @} */
> +
> + /**
> + * CMP: Sets the low bit of the destination channels with the result
> + * of the comparison, while the upper bits are undefined, and updates
> + * the flag register with the packed 16 bits of the result.
> + */
> + instruction *
> + CMP(const dst_reg &dst, const src_reg &src0, const src_reg &src1,
> + brw_conditional_mod condition) const
> + {
> + /* Take the instruction:
> + *
> + * CMP null<d> src0<f> src1<f>
> + *
> + * Original gen4 does type conversion to the destination type
> + * before comparison, producing garbage results for floating
> + * point comparisons.
> + *
> + * The destination type doesn't matter on newer generations,
> + * so we set the type to match src0 so we can compact the
> + * instruction.
> + */
> + return set_condmod(condition,
> + emit(BRW_OPCODE_CMP, retype(dst, src0.type),
> + fix_unsigned_negate(src0),
> + fix_unsigned_negate(src1)));
> + }
> +
> + /**
> + * Gen4 predicated IF.
> + */
> + instruction *
> + IF(brw_predicate predicate) const
> + {
> + instruction *inst = emit(BRW_OPCODE_IF);
> + return set_predicate(predicate, inst);
> + }
> +
> + /**
> + * Gen6 IF with embedded comparison.
> + */
> + instruction *
> + IF(const src_reg &src0, const src_reg &src1,
> + brw_conditional_mod condition) const
> + {
> + assert(shader->devinfo->gen == 6);
> + return set_condmod(condition,
> + emit(BRW_OPCODE_IF,
> + null_reg_d(),
> + fix_unsigned_negate(src0),
> + fix_unsigned_negate(src1)));
I don't see that we were calling resolve_ud_negate() on
fs_visitor::emit_if_gen6.
In fact, we've removed the function entirely (a known regression in
the switch to NIR). I'd drop this for now. It's unused as well.
> + }
> +
> + /**
> + * Emit a linear interpolation instruction.
> + */
> + instruction *
> + LRP(const dst_reg &dst, const src_reg &x, const src_reg &y,
> + const src_reg &a) const
> + {
> + if (shader->devinfo->gen >= 6) {
> + /* The LRP instruction actually does op1 * op0 + op2 * (1 - op0), so
> + * we need to reorder the operands.
> + */
> + return emit(BRW_OPCODE_LRP, dst, a, y, x);
> +
> + } else {
> + /* We can't use the LRP instruction. Emit x*(1-a) + y*a. */
> + const dst_reg y_times_a = vgrf(dst.type);
> + const dst_reg one_minus_a = vgrf(dst.type);
> + const dst_reg x_times_one_minus_a = vgrf(dst.type);
> +
> + MUL(y_times_a, y, a);
> + ADD(one_minus_a, negate(a), src_reg(1.0f));
> + MUL(x_times_one_minus_a, x, src_reg(one_minus_a));
> + return ADD(dst, src_reg(x_times_one_minus_a), src_reg(y_times_a));
> + }
> + }
> +
> + /**
> + * Collect a number of registers in a contiguous range of registers.
> + */
> + instruction *
> + LOAD_PAYLOAD(const dst_reg &dst, const src_reg *src,
> + unsigned sources, unsigned header_size) const
> + {
> + assert(dst.width % 8 == 0);
> + instruction *inst = emit(instruction(SHADER_OPCODE_LOAD_PAYLOAD,
> + dst.width, dst, src, sources));
> + inst->header_size = header_size;
> +
> + for (unsigned i = 0; i < header_size; i++)
> + assert(src[i].file != GRF ||
> + src[i].width * type_sz(src[i].type) == 32);
> + inst->regs_written = header_size;
> +
> + for (unsigned i = header_size; i < sources; ++i)
> + assert(src[i].file != GRF ||
> + src[i].width == dst.width);
> + inst->regs_written += (sources - header_size) * (dst.width / 8);
> +
> + return inst;
> + }
> +
> + backend_shader *shader;
> +
> + private:
> + /**
> + * Workaround for negation of UD registers. See comment in
> + * fs_generator::generate_code() for more details.
> + */
> + src_reg
> + fix_unsigned_negate(const src_reg &src) const
> + {
> + if (src.type == BRW_REGISTER_TYPE_UD &&
> + src.negate) {
> + dst_reg temp = vgrf(BRW_REGISTER_TYPE_UD);
> + MOV(temp, src);
> + return src_reg(temp);
> + } else {
> + return src;
> + }
> + }
> +
> + /**
> + * Workaround for source register modes not supported by the ternary
> + * instruction encoding.
> + */
> + src_reg
> + fix_3src_operand(const src_reg &src) const
> + {
> + if (src.file == GRF || src.file == UNIFORM || src.stride > 1) {
> + return src;
> + } else {
> + dst_reg expanded = vgrf(src.type);
> + MOV(expanded, src);
> + return expanded;
> + }
> + }
> +
> + /**
> + * Workaround for source register modes not supported by the math
> + * instruction.
> + */
> + src_reg
> + fix_math_operand(const src_reg &src) const
> + {
> + /* Can't do hstride == 0 args on gen6 math, so expand it out. We
> + * might be able to do better by doing execsize = 1 math and then
> + * expanding that result out, but we would need to be careful with
> + * masking.
> + *
> + * Gen6 hardware ignores source modifiers (negate and abs) on math
> + * instructions, so we also move to a temp to set those up.
> + *
> + * Gen7 relaxes most of the above restrictions, but still can't use IMM
> + * operands to math
> + */
> + if ((shader->devinfo->gen == 6 &&
> + (src.file == IMM || src.file == UNIFORM ||
> + src.abs || src.negate)) ||
> + (shader->devinfo->gen == 7 && src.file == IMM)) {
> + const dst_reg tmp = vgrf(src.type);
> + MOV(tmp, src);
> + return tmp;
> + } else {
> + return src;
> + }
> + }
> +
> + /**
> + * Workaround other weirdness of the math instruction.
> + */
> + instruction *
> + fix_math_instruction(instruction *instr) const
Is there a reason this is 'instr' instead of 'inst'? If not, let's
rename it 'inst'.
> + {
> + if (shader->devinfo->gen < 6) {
> + instr->base_mrf = 2;
> + instr->mlen = instr->sources * dispatch_width() / 8;
> +
> + if (instr->sources > 1) {
> + /* From the Ironlake PRM, Volume 4, Part 1, Section 6.1.13
> + * "Message Payload":
> + *
> + * "Operand0[7]. For the INT DIV functions, this operand is the
> + * denominator."
> + * ...
> + * "Operand1[7]. For the INT DIV functions, this operand is the
> + * numerator."
> + */
> + const bool is_int_div = instr->opcode != SHADER_OPCODE_POW;
> + const fs_reg src0 = is_int_div ? instr->src[1] : instr->src[0];
> + const fs_reg src1 = is_int_div ? instr->src[0] : instr->src[1];
> +
> + instr->resize_sources(1);
> + instr->src[0] = src0;
> +
> + at(block, instr).MOV(fs_reg(MRF, instr->base_mrf + 1, src1.type,
> + dispatch_width()), src1);
> + }
> + }
> +
> + return instr;
> + }
More information about the mesa-dev
mailing list