[Mesa-dev] [PATCH 01/38] i965/fs: Introduce FS IR builder.

Francisco Jerez currojerez at riseup.net
Tue Jun 9 03:52:44 PDT 2015


Matt Turner <mattst88 at gmail.com> writes:

> On Fri, Jun 5, 2015 at 4:13 PM, Francisco Jerez <currojerez at riseup.net> wrote:
>> Matt Turner <mattst88 at gmail.com> writes:
>>
>>> On Fri, Jun 5, 2015 at 1:42 PM, Francisco Jerez <currojerez at riseup.net> wrote:
>>>> Matt Turner <mattst88 at gmail.com> writes:
>>>>
>>>>> On Thu, Jun 4, 2015 at 9:04 AM, Francisco Jerez <currojerez at riseup.net> wrote:
>>>>>> The purpose of this change is threefold: First, it improves the
>>>>>> modularity of the compiler back-end by separating the functionality
>>>>>> required to construct an i965 IR program from the rest of the visitor
>>>>>> god-object, what in turn will reduce the coupling between other
>>>>>> components and the visitor allowing a more modular design.  This patch
>>>>>> doesn't yet remove the equivalent functionality from the visitor
>>>>>> classes, that task will be undertaken by a separate series, as it
>>>>>> involves major back-end surgery.
>>>>>>
>>>>>> Second, it improves consistency between the scalar and vector
>>>>>> back-ends.  The FS and VEC4 builders can both be used to generate
>>>>>> scalar code with a compatible interface or they can be used to
>>>>>> generate natural vector width code -- 1 or 4 components respectively.
>>>>>>
>>>>>> Third, the approach to IR construction is somewhat different to what
>>>>>> the visitor classes currently do.  All parameters affecting code
>>>>>> generation (execution size, half control, point in the program where
>>>>>> new instructions are inserted, etc.) are encapsulated in a stand-alone
>>>>>> object rather than being quasi-global state (yes, anything defined in
>>>>>> one of the visitor classes is effectively global due to the tight
>>>>>> coupling with virtually everything else in the compiler back-end).
>>>>>> This object is lightweight and can be copied, mutated and passed
>>>>>> around, making helper IR-building functions more flexible because they
>>>>>> can now simply take a builder object as argument and will inherit its
>>>>>> IR generation properties in exactly the same way that a discrete
>>>>>> instruction would from the same builder object.
>>>>>>
>>>>>> The emit_typed_write() function from my image-load-store branch is an
>>>>>> example that illustrates the usefulness of the latter point: Due to
>>>>>> hardware limitations the function may have to split the untyped
>>>>>> surface message in 8-wide chunks.  That means that the several
>>>>>> functions called to help with the construction of the message payload
>>>>>> are themselves required to set the execution width and half control
>>>>>> correctly on the instructions they emit, and to allocate all registers
>>>>>> with half the default width.  With the previous approach this would
>>>>>> require the used helper functions to be aware of the parameters that
>>>>>> might differ from the default state and explicitly set the instruction
>>>>>> bits accordingly.  With the new approach they would get a modified
>>>>>> builder object as argument that would influence all instructions
>>>>>> emitted by the helper function as if it were the default state.
>>>>>>
>>>>>> Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
>>>>>> method.  It doesn't actually emit any instructions, they are simply
>>>>>> created and inserted into an exec_list which is returned for the
>>>>>> caller to emit at some location of the program.  This sort of two-step
>>>>>> emission becomes unnecessary with the builder interface because the
>>>>>> insertion point is one more of the code generation parameters which
>>>>>> are part of the builder object.  The caller can simply pass
>>>>>> VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
>>>>>> location of the program where the effect of the constant load is
>>>>>> desired.  This two-step emission (which pervades the compiler back-end
>>>>>> and is in most cases redundant) goes away: E.g. ADD() now actually
>>>>>> adds two registers rather than just creating an ADD instruction in
>>>>>> memory, emit(ADD()) is no longer necessary.
>>>>>>
>>>>>> v2: Drop scalarizing VEC4 builder.
>>>>>> v3: Take a backend_shader as constructor argument.  Improve handling
>>>>>>     of debug annotations and execution control flags.
>>>>>> ---
>>>>>>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>>>>>>  src/mesa/drivers/dri/i965/brw_fs_builder.h | 659 +++++++++++++++++++++++++++++
>>>>>>  2 files changed, 660 insertions(+)
>>>>>>  create mode 100644 src/mesa/drivers/dri/i965/brw_fs_builder.h
>>>>>>
>>>>>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources
>>>>>> index 3f852cd..93f336e 100644
>>>>>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>>>>>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>>>>>> @@ -42,6 +42,7 @@ i965_FILES = \
>>>>>>         brw_ff_gs.c \
>>>>>>         brw_ff_gs_emit.c \
>>>>>>         brw_ff_gs.h \
>>>>>> +       brw_fs_builder.h \
>>>>>>         brw_fs_channel_expressions.cpp \
>>>>>>         brw_fs_cmod_propagation.cpp \
>>>>>>         brw_fs_combine_constants.cpp \
>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h b/src/mesa/drivers/dri/i965/brw_fs_builder.h
>>>>>> new file mode 100644
>>>>>> index 0000000..e528180
>>>>>> --- /dev/null
>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
>>>>>> @@ -0,0 +1,659 @@
>>>>>> +/* -*- c++ -*- */
>>>>>> +/*
>>>>>> + * Copyright © 2010-2015 Intel Corporation
>>>>>> + *
>>>>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>>>>> + * copy of this software and associated documentation files (the "Software"),
>>>>>> + * to deal in the Software without restriction, including without limitation
>>>>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>>>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>>>>> + * Software is furnished to do so, subject to the following conditions:
>>>>>> + *
>>>>>> + * The above copyright notice and this permission notice (including the next
>>>>>> + * paragraph) shall be included in all copies or substantial portions of the
>>>>>> + * Software.
>>>>>> + *
>>>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>>>>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>>>>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>>>>>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
>>>>>> + * IN THE SOFTWARE.
>>>>>> + */
>>>>>> +
>>>>>> +#ifndef BRW_FS_BUILDER_H
>>>>>> +#define BRW_FS_BUILDER_H
>>>>>> +
>>>>>> +#include "brw_ir_fs.h"
>>>>>> +#include "brw_shader.h"
>>>>>> +#include "brw_context.h"
>>>>>> +
>>>>>> +namespace brw {
>>>>>> +   /**
>>>>>> +    * Toolbox to assemble an FS IR program out of individual instructions.
>>>>>> +    *
>>>>>> +    * This object is meant to have an interface consistent with
>>>>>> +    * brw::vec4_builder.  They cannot be fully interchangeable because
>>>>>> +    * brw::fs_builder generates scalar code while brw::vec4_builder generates
>>>>>> +    * vector code.
>>>>>> +    */
>>>>>> +   class fs_builder {
>>>>>> +   public:
>>>>>> +      /** Type used in this IR to represent a source of an instruction. */
>>>>>> +      typedef fs_reg src_reg;
>>>>>> +
>>>>>> +      /** Type used in this IR to represent the destination of an instruction. */
>>>>>> +      typedef fs_reg dst_reg;
>>>>>> +
>>>>>> +      /** Type used in this IR to represent an instruction. */
>>>>>> +      typedef fs_inst instruction;
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Construct an fs_builder appending instructions at the end of
>>>>>> +       * the instruction list of \p shader.  \p dispatch_width gives
>>>>>> +       * the native execution width of the program.
>>>>>> +       */
>>>>>> +      fs_builder(backend_shader *shader,
>>>>>> +                 unsigned dispatch_width) :
>>>>>> +         shader(shader), block(NULL),
>>>>>> +         cursor((exec_node *)&shader->instructions.tail),
>>>>>> +         _dispatch_width(dispatch_width),
>>>>>> +         _group(0),
>>>>>> +         force_writemask_all(false),
>>>>>> +         annotation()
>>>>>> +      {
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Construct an fs_builder that inserts instructions before \p cursor in
>>>>>> +       * basic block \p block, inheriting other code generation parameters
>>>>>> +       * from this.
>>>>>> +       */
>>>>>> +      fs_builder
>>>>>> +      at(bblock_t *block, exec_node *cursor) const
>>>>>> +      {
>>>>>> +         fs_builder bld = *this;
>>>>>> +         bld.block = block;
>>>>>> +         bld.cursor = cursor;
>>>>>> +         return bld;
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Construct a builder specifying the default SIMD width and group of
>>>>>> +       * channel enable signals, inheriting other code generation parameters
>>>>>> +       * from this.
>>>>>> +       *
>>>>>> +       * \p n gives the default SIMD width, \p i gives the slot group used for
>>>>>> +       * predication and control flow masking in multiples of \p n channels.
>>>>>> +       */
>>>>>> +      fs_builder
>>>>>> +      group(unsigned n, unsigned i) const
>>>>>> +      {
>>>>>> +         assert(n <= dispatch_width() &&
>>>>>> +                i < dispatch_width() / n);
>>>>>> +         fs_builder bld = *this;
>>>>>> +         bld._dispatch_width = n;
>>>>>> +         bld._group += i * n;
>>>>>> +         return bld;
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Alias for group() with width equal to eight.
>>>>>> +       */
>>>>>> +      fs_builder
>>>>>> +      half(unsigned i) const
>>>>>> +      {
>>>>>> +         return group(8, i);
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Construct a builder with per-channel control flow execution masking
>>>>>> +       * disabled if \p b is true.  If control flow execution masking is
>>>>>> +       * already disabled this has no effect.
>>>>>> +       */
>>>>>> +      fs_builder
>>>>>> +      exec_all(bool b = true) const
>>>>>> +      {
>>>>>> +         fs_builder bld = *this;
>>>>>> +         if (b)
>>>>>> +            bld.force_writemask_all = true;
>>>>>> +         return bld;
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Construct a builder with the given debug annotation info.
>>>>>> +       */
>>>>>> +      fs_builder
>>>>>> +      annotate(const char *str, const void *ir = NULL) const
>>>>>> +      {
>>>>>> +         fs_builder bld = *this;
>>>>>> +         bld.annotation.str = str;
>>>>>> +         bld.annotation.ir = ir;
>>>>>> +         return bld;
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Get the SIMD width in use.
>>>>>> +       */
>>>>>> +      unsigned
>>>>>> +      dispatch_width() const
>>>>>> +      {
>>>>>> +         return _dispatch_width;
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Allocate a virtual register of natural vector size (one for this IR)
>>>>>> +       * and SIMD width.  \p n gives the amount of space to allocate in
>>>>>> +       * dispatch_width units (which is just enough space for one logical
>>>>>> +       * component in this IR).
>>>>>> +       */
>>>>>> +      dst_reg
>>>>>> +      vgrf(enum brw_reg_type type, unsigned n = 1) const
>>>>>> +      {
>>>>>> +         return dst_reg(GRF, shader->alloc.allocate(
>>>>>> +                           DIV_ROUND_UP(n * type_sz(type) * dispatch_width(),
>>>>>> +                                        REG_SIZE)),
>>>>>> +                        type, dispatch_width());
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Create a null register of floating type.
>>>>>> +       */
>>>>>> +      dst_reg
>>>>>> +      null_reg_f() const
>>>>>> +      {
>>>>>> +         return dst_reg(retype(brw_null_vec(dispatch_width()),
>>>>>> +                               BRW_REGISTER_TYPE_F));
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Create a null register of signed integer type.
>>>>>> +       */
>>>>>> +      dst_reg
>>>>>> +      null_reg_d() const
>>>>>> +      {
>>>>>> +         return dst_reg(retype(brw_null_vec(dispatch_width()),
>>>>>> +                               BRW_REGISTER_TYPE_D));
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Create a null register of unsigned integer type.
>>>>>> +       */
>>>>>> +      dst_reg
>>>>>> +      null_reg_ud() const
>>>>>> +      {
>>>>>> +         return dst_reg(retype(brw_null_vec(dispatch_width()),
>>>>>> +                               BRW_REGISTER_TYPE_UD));
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Get the mask of SIMD channels enabled by dispatch and not yet
>>>>>> +       * disabled by discard.
>>>>>> +       */
>>>>>> +      src_reg
>>>>>> +      sample_mask_reg() const
>>>>>> +      {
>>>>>> +         const bool uses_kill =
>>>>>> +            (shader->stage == MESA_SHADER_FRAGMENT &&
>>>>>> +             ((brw_wm_prog_data *)shader->stage_prog_data)->uses_kill);
>>>>>> +         return (shader->stage != MESA_SHADER_FRAGMENT ? src_reg(0xffff) :
>>>>>> +                 uses_kill ? brw_flag_reg(0, 1) :
>>>>>> +                 retype(brw_vec1_grf(1, 7), BRW_REGISTER_TYPE_UD));
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Insert an instruction into the program.
>>>>>> +       */
>>>>>> +      instruction *
>>>>>> +      emit(const instruction &inst) const
>>>>>> +      {
>>>>>> +         return emit(new(shader->mem_ctx) instruction(inst));
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Create and insert a nullary control instruction into the program.
>>>>>> +       */
>>>>>> +      instruction *
>>>>>> +      emit(enum opcode opcode) const
>>>>>> +      {
>>>>>> +         return emit(instruction(opcode, dispatch_width()));
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Create and insert a nullary instruction into the program.
>>>>>> +       */
>>>>>> +      instruction *
>>>>>> +      emit(enum opcode opcode, const dst_reg &dst) const
>>>>>> +      {
>>>>>> +         return emit(instruction(opcode, dst));
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Create and insert a unary instruction into the program.
>>>>>> +       */
>>>>>> +      instruction *
>>>>>> +      emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0) const
>>>>>> +      {
>>>>>> +         switch (opcode) {
>>>>>> +         case SHADER_OPCODE_RCP:
>>>>>> +         case SHADER_OPCODE_RSQ:
>>>>>> +         case SHADER_OPCODE_SQRT:
>>>>>> +         case SHADER_OPCODE_EXP2:
>>>>>> +         case SHADER_OPCODE_LOG2:
>>>>>> +         case SHADER_OPCODE_SIN:
>>>>>> +         case SHADER_OPCODE_COS:
>>>>>> +            return fix_math_instruction(
>>>>>> +               emit(instruction(opcode, dst.width, dst,
>>>>>> +                                fix_math_operand(src0))));
>>>>>> +
>>>>>> +         default:
>>>>>> +            return emit(instruction(opcode, dst.width, dst, src0));
>>>>>> +         }
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Create and insert a binary instruction into the program.
>>>>>> +       */
>>>>>> +      instruction *
>>>>>> +      emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0,
>>>>>> +           const src_reg &src1) const
>>>>>> +      {
>>>>>> +         switch (opcode) {
>>>>>> +         case SHADER_OPCODE_POW:
>>>>>> +         case SHADER_OPCODE_INT_QUOTIENT:
>>>>>> +         case SHADER_OPCODE_INT_REMAINDER:
>>>>>> +            return fix_math_instruction(
>>>>>> +               emit(instruction(opcode, dst.width, dst,
>>>>>> +                                fix_math_operand(src0),
>>>>>> +                                fix_math_operand(src1))));
>>>>>> +
>>>>>> +         default:
>>>>>> +            return emit(instruction(opcode, dst.width, dst, src0, src1));
>>>>>> +
>>>>>> +         }
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Create and insert a ternary instruction into the program.
>>>>>> +       */
>>>>>> +      instruction *
>>>>>> +      emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0,
>>>>>> +           const src_reg &src1, const src_reg &src2) const
>>>>>> +      {
>>>>>> +         switch (opcode) {
>>>>>> +         case BRW_OPCODE_BFE:
>>>>>> +         case BRW_OPCODE_BFI2:
>>>>>> +         case BRW_OPCODE_MAD:
>>>>>> +         case BRW_OPCODE_LRP:
>>>>>> +            return emit(instruction(opcode, dst.width, dst,
>>>>>> +                                    fix_3src_operand(src0),
>>>>>> +                                    fix_3src_operand(src1),
>>>>>> +                                    fix_3src_operand(src2)));
>>>>>> +
>>>>>> +         default:
>>>>>> +            return emit(instruction(opcode, dst.width, dst, src0, src1, src2));
>>>>>> +         }
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Insert a preallocated instruction into the program.
>>>>>> +       */
>>>>>> +      instruction *
>>>>>> +      emit(instruction *inst) const
>>>>>> +      {
>>>>>> +         assert(inst->exec_size == dispatch_width() ||
>>>>>> +                force_writemask_all);
>>>>>> +         assert(_group == 0 || _group == 8);
>>>>>> +
>>>>>> +         inst->force_sechalf = (_group == 8);
>>>>>> +         inst->force_writemask_all = force_writemask_all;
>>>>>> +         inst->annotation = annotation.str;
>>>>>> +         inst->ir = annotation.ir;
>>>>>> +
>>>>>> +         if (block)
>>>>>> +            static_cast<instruction *>(cursor)->insert_before(block, inst);
>>>>>> +         else
>>>>>> +            cursor->insert_before(inst);
>>>>>> +
>>>>>> +         return inst;
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Select \p src0 if the comparison of both sources with the given
>>>>>> +       * conditional mod evaluates to true, otherwise select \p src1.
>>>>>> +       *
>>>>>> +       * Generally useful to get the minimum or maximum of two values.
>>>>>> +       */
>>>>>> +      void
>>>>>> +      emit_minmax(const dst_reg &dst, const src_reg &src0,
>>>>>> +                  const src_reg &src1, brw_conditional_mod mod) const
>>>>>
>>>>> This is unused, after Jason's commit:
>>>>>
>>>>> commit 78644ffc4d341deb431145108f0b2d377e59b61e
>>>>> Author: Jason Ekstrand <jason.ekstrand at intel.com>
>>>>> Date:   Wed May 20 10:35:34 2015 -0700
>>>>>
>>>>>     i965/fs: Remove the ir_visitor code
>>>>>
>>>>>     Now that everything is running through NIR, this is all dead.
>>>>>
>>>>>     Reviewed-by: Matt Turner <mattst88 at gmail.com>
>>>>>     Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
>>>>>
>>>>
>>>> I was planning to use it in my ARB_shader_image_load_store series.  It
>>>> may also make sense to fix brw_fs_nir.cpp to use it, as it would
>>>> probably simplify the code slightly.  I can make the changes as part of
>>>> PATCH 32 if you like.
>>>
>>> Oh, okay. I didn't realize that brw_fs_nir.cpp had it open-coded.
>>> Yeah, it's fine to leave it in here.
>>>
>>>>
>>>>>> +      {
>>>>>> +         if (shader->devinfo->gen >= 6) {
>>>>>> +            set_condmod(mod, SEL(dst, fix_unsigned_negate(src0),
>>>>>> +                                 fix_unsigned_negate(src1)));
>>>>>> +         } else {
>>>>>> +            CMP(null_reg_d(), src0, src1, mod);
>>>>>> +            set_predicate(BRW_PREDICATE_NORMAL,
>>>>>> +                          SEL(dst, src0, src1));
>>>>>> +         }
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Copy any live channel from \p src to the first channel of \p dst.
>>>>>> +       */
>>>>>> +      void
>>>>>> +      emit_uniformize(const dst_reg &dst, const src_reg &src) const
>>>>>> +      {
>>>>>> +         const fs_builder ubld = exec_all();
>>>>>> +         const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD);
>>>>>> +
>>>>>> +         ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, component(chan_index, 0));
>>>>>> +         ubld.emit(SHADER_OPCODE_BROADCAST, component(dst, 0),
>>>>>> +                   src, component(chan_index, 0));
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Assorted arithmetic ops.
>>>>>> +       * @{
>>>>>> +       */
>>>>>> +#define ALU1(op)                                        \
>>>>>> +      instruction *                                     \
>>>>>> +      op(const dst_reg &dst, const src_reg &src0) const \
>>>>>> +      {                                                 \
>>>>>> +         return emit(BRW_OPCODE_##op, dst, src0);       \
>>>>>> +      }
>>>>>> +
>>>>>> +#define ALU2(op)                                                        \
>>>>>> +      instruction *                                                     \
>>>>>> +      op(const dst_reg &dst, const src_reg &src0, const src_reg &src1) const \
>>>>>> +      {                                                                 \
>>>>>> +         return emit(BRW_OPCODE_##op, dst, src0, src1);                 \
>>>>>> +      }
>>>>>> +
>>>>>> +#define ALU2_ACC(op)                                                    \
>>>>>> +      instruction *                                                     \
>>>>>> +      op(const dst_reg &dst, const src_reg &src0, const src_reg &src1) const \
>>>>>> +      {                                                                 \
>>>>>> +         instruction *inst = emit(BRW_OPCODE_##op, dst, src0, src1);    \
>>>>>> +         inst->writes_accumulator = true;                               \
>>>>>> +         return inst;                                                   \
>>>>>> +      }
>>>>>> +
>>>>>> +#define ALU3(op)                                                        \
>>>>>> +      instruction *                                                     \
>>>>>> +      op(const dst_reg &dst, const src_reg &src0, const src_reg &src1,  \
>>>>>> +         const src_reg &src2) const                                     \
>>>>>> +      {                                                                 \
>>>>>> +         return emit(BRW_OPCODE_##op, dst, src0, src1, src2);           \
>>>>>> +      }
>>>>>> +
>>>>>> +      ALU2(ADD)
>>>>>> +      ALU2_ACC(ADDC)
>>>>>> +      ALU2(AND)
>>>>>> +      ALU2(ASR)
>>>>>> +      ALU2(AVG)
>>>>>> +      ALU3(BFE)
>>>>>> +      ALU2(BFI1)
>>>>>> +      ALU3(BFI2)
>>>>>> +      ALU1(BFREV)
>>>>>> +      ALU1(CBIT)
>>>>>> +      ALU2(CMPN)
>>>>>> +      ALU3(CSEL)
>>>>>> +      ALU2(DP2)
>>>>>> +      ALU2(DP3)
>>>>>> +      ALU2(DP4)
>>>>>> +      ALU2(DPH)
>>>>>> +      ALU1(F16TO32)
>>>>>> +      ALU1(F32TO16)
>>>>>> +      ALU1(FBH)
>>>>>> +      ALU1(FBL)
>>>>>> +      ALU1(FRC)
>>>>>> +      ALU2(LINE)
>>>>>> +      ALU1(LZD)
>>>>>> +      ALU2(MAC)
>>>>>> +      ALU2_ACC(MACH)
>>>>>> +      ALU3(MAD)
>>>>>> +      ALU1(MOV)
>>>>>> +      ALU2(MUL)
>>>>>> +      ALU1(NOT)
>>>>>> +      ALU2(OR)
>>>>>> +      ALU2(PLN)
>>>>>> +      ALU1(RNDD)
>>>>>> +      ALU1(RNDE)
>>>>>> +      ALU1(RNDU)
>>>>>> +      ALU1(RNDZ)
>>>>>> +      ALU2(SAD2)
>>>>>> +      ALU2_ACC(SADA2)
>>>>>> +      ALU2(SEL)
>>>>>> +      ALU2(SHL)
>>>>>> +      ALU2(SHR)
>>>>>> +      ALU2_ACC(SUBB)
>>>>>> +      ALU2(XOR)
>>>>>> +
>>>>>> +#undef ALU3
>>>>>> +#undef ALU2_ACC
>>>>>> +#undef ALU2
>>>>>> +#undef ALU1
>>>>>> +      /** @} */
>>>>>> +
>>>>>> +      /**
>>>>>> +       * CMP: Sets the low bit of the destination channels with the result
>>>>>> +       * of the comparison, while the upper bits are undefined, and updates
>>>>>> +       * the flag register with the packed 16 bits of the result.
>>>>>> +       */
>>>>>> +      instruction *
>>>>>> +      CMP(const dst_reg &dst, const src_reg &src0, const src_reg &src1,
>>>>>> +          brw_conditional_mod condition) const
>>>>>> +      {
>>>>>> +         /* Take the instruction:
>>>>>> +          *
>>>>>> +          * CMP null<d> src0<f> src1<f>
>>>>>> +          *
>>>>>> +          * Original gen4 does type conversion to the destination type
>>>>>> +          * before comparison, producing garbage results for floating
>>>>>> +          * point comparisons.
>>>>>> +          *
>>>>>> +          * The destination type doesn't matter on newer generations,
>>>>>> +          * so we set the type to match src0 so we can compact the
>>>>>> +          * instruction.
>>>>>> +          */
>>>>>> +         return set_condmod(condition,
>>>>>> +                            emit(BRW_OPCODE_CMP, retype(dst, src0.type),
>>>>>> +                                 fix_unsigned_negate(src0),
>>>>>> +                                 fix_unsigned_negate(src1)));
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Gen4 predicated IF.
>>>>>> +       */
>>>>>> +      instruction *
>>>>>> +      IF(brw_predicate predicate) const
>>>>>> +      {
>>>>>> +         instruction *inst = emit(BRW_OPCODE_IF);
>>>>>> +         return set_predicate(predicate, inst);
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Gen6 IF with embedded comparison.
>>>>>> +       */
>>>>>> +      instruction *
>>>>>> +      IF(const src_reg &src0, const src_reg &src1,
>>>>>> +         brw_conditional_mod condition) const
>>>>>> +      {
>>>>>> +         assert(shader->devinfo->gen == 6);
>>>>>> +         return set_condmod(condition,
>>>>>> +                            emit(BRW_OPCODE_IF,
>>>>>> +                                 null_reg_d(),
>>>>>> +                                 fix_unsigned_negate(src0),
>>>>>> +                                 fix_unsigned_negate(src1)));
>>>>>
>>>>> I don't see that we were calling resolve_ud_negate() on
>>>>> fs_visitor::emit_if_gen6.
>>>>>
>>>> Right, but shouldn't any instruction comparing unsigned values which are
>>>> potentially negated call resolve_ud_negate()?
>>>
>>> From looking at the Sandybridge PRM, the IF with embedded conditional
>>> cannot do source modifiers, so I think the answer is no.
>>>
>> I suspect that the PRM is wrong, IIRC I checked this on real hardware
>> and the simulator, and the embedded conditional on SNB behaves largely
>> like the normal CMP instruction, source modifiers work as usual (though
>> we didn't actually make use of them AFAIK).
>
> All the more reason that this should be done as a separate patch with
> an explanation.
>
>>>>> In fact, we've removed the function entirely (a known regression in
>>>>> the switch to NIR). I'd drop this for now. It's unused as well.
>>>>>
>>>> If there are still plans to restore this functionality, I would rather
>>>> leave it alone rather than remove it now for somebody else to have to
>>>> rewrite it later on, because it's less work and doesn't seem to hurt.
>>>
>>> But moreover, again I don't like hidden changes like this in what is
>>> sold as a refactor.
>>>
>> It was never my intention to sell this as a refactor.
>
> Are you just arguing semantics?
>
> Call it what you will -- the series is refactoring a bunch of code.
>
>>> Even if it had been a bug in the original code, this patch would *not*
>>> have been the appropriate place to fix it.
>>>
>> Heh, OK, next time I'm rewriting code I'll try to pay more attention to
>> preserve the bugs I notice in the original code.
>
> Drop the snark. I shouldn't have to explain to you that huge
> copy-and-paste blocks shouldn't have hidden functional changes.

Sorry for the snark, I couldn't help to answer humorously to your
apparent anger at my rewrite being more careful than the original.

No, it wasn't just a copy-and-paste block, the one function we've been
arguing about had little resemblance (except maybe in the Doxygen
comment above) to the original, you could have guessed I had rewritten
it fully.

And no, it wasn't a functional change, it matched the behavior of the
original function exactly, for the whole set of inputs for which the
original wouldn't generate broken code.  If we didn't have a bug that
is.

Anyway, I've dropped the definition of this unnecessary function
altogether, as you requested.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150609/728669ae/attachment-0001.sig>


More information about the mesa-dev mailing list