[Mesa-dev] [PATCH 01/38] i965/fs: Introduce FS IR builder.
Matt Turner
mattst88 at gmail.com
Mon Jun 8 09:50:13 PDT 2015
On Fri, Jun 5, 2015 at 4:13 PM, Francisco Jerez <currojerez at riseup.net> wrote:
> Matt Turner <mattst88 at gmail.com> writes:
>
>> On Fri, Jun 5, 2015 at 1:42 PM, Francisco Jerez <currojerez at riseup.net> wrote:
>>> Matt Turner <mattst88 at gmail.com> writes:
>>>
>>>> On Thu, Jun 4, 2015 at 9:04 AM, Francisco Jerez <currojerez at riseup.net> wrote:
>>>>> The purpose of this change is threefold: First, it improves the
>>>>> modularity of the compiler back-end by separating the functionality
>>>>> required to construct an i965 IR program from the rest of the visitor
>>>>> god-object, what in turn will reduce the coupling between other
>>>>> components and the visitor allowing a more modular design. This patch
>>>>> doesn't yet remove the equivalent functionality from the visitor
>>>>> classes, that task will be undertaken by a separate series, as it
>>>>> involves major back-end surgery.
>>>>>
>>>>> Second, it improves consistency between the scalar and vector
>>>>> back-ends. The FS and VEC4 builders can both be used to generate
>>>>> scalar code with a compatible interface or they can be used to
>>>>> generate natural vector width code -- 1 or 4 components respectively.
>>>>>
>>>>> Third, the approach to IR construction is somewhat different to what
>>>>> the visitor classes currently do. All parameters affecting code
>>>>> generation (execution size, half control, point in the program where
>>>>> new instructions are inserted, etc.) are encapsulated in a stand-alone
>>>>> object rather than being quasi-global state (yes, anything defined in
>>>>> one of the visitor classes is effectively global due to the tight
>>>>> coupling with virtually everything else in the compiler back-end).
>>>>> This object is lightweight and can be copied, mutated and passed
>>>>> around, making helper IR-building functions more flexible because they
>>>>> can now simply take a builder object as argument and will inherit its
>>>>> IR generation properties in exactly the same way that a discrete
>>>>> instruction would from the same builder object.
>>>>>
>>>>> The emit_typed_write() function from my image-load-store branch is an
>>>>> example that illustrates the usefulness of the latter point: Due to
>>>>> hardware limitations the function may have to split the untyped
>>>>> surface message in 8-wide chunks. That means that the several
>>>>> functions called to help with the construction of the message payload
>>>>> are themselves required to set the execution width and half control
>>>>> correctly on the instructions they emit, and to allocate all registers
>>>>> with half the default width. With the previous approach this would
>>>>> require the used helper functions to be aware of the parameters that
>>>>> might differ from the default state and explicitly set the instruction
>>>>> bits accordingly. With the new approach they would get a modified
>>>>> builder object as argument that would influence all instructions
>>>>> emitted by the helper function as if it were the default state.
>>>>>
>>>>> Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
>>>>> method. It doesn't actually emit any instructions, they are simply
>>>>> created and inserted into an exec_list which is returned for the
>>>>> caller to emit at some location of the program. This sort of two-step
>>>>> emission becomes unnecessary with the builder interface because the
>>>>> insertion point is one more of the code generation parameters which
>>>>> are part of the builder object. The caller can simply pass
>>>>> VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
>>>>> location of the program where the effect of the constant load is
>>>>> desired. This two-step emission (which pervades the compiler back-end
>>>>> and is in most cases redundant) goes away: E.g. ADD() now actually
>>>>> adds two registers rather than just creating an ADD instruction in
>>>>> memory, emit(ADD()) is no longer necessary.
>>>>>
>>>>> v2: Drop scalarizing VEC4 builder.
>>>>> v3: Take a backend_shader as constructor argument. Improve handling
>>>>> of debug annotations and execution control flags.
>>>>> ---
>>>>> src/mesa/drivers/dri/i965/Makefile.sources | 1 +
>>>>> src/mesa/drivers/dri/i965/brw_fs_builder.h | 659 +++++++++++++++++++++++++++++
>>>>> 2 files changed, 660 insertions(+)
>>>>> create mode 100644 src/mesa/drivers/dri/i965/brw_fs_builder.h
>>>>>
>>>>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources
>>>>> index 3f852cd..93f336e 100644
>>>>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>>>>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>>>>> @@ -42,6 +42,7 @@ i965_FILES = \
>>>>> brw_ff_gs.c \
>>>>> brw_ff_gs_emit.c \
>>>>> brw_ff_gs.h \
>>>>> + brw_fs_builder.h \
>>>>> brw_fs_channel_expressions.cpp \
>>>>> brw_fs_cmod_propagation.cpp \
>>>>> brw_fs_combine_constants.cpp \
>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h b/src/mesa/drivers/dri/i965/brw_fs_builder.h
>>>>> new file mode 100644
>>>>> index 0000000..e528180
>>>>> --- /dev/null
>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
>>>>> @@ -0,0 +1,659 @@
>>>>> +/* -*- c++ -*- */
>>>>> +/*
>>>>> + * Copyright © 2010-2015 Intel Corporation
>>>>> + *
>>>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>>>> + * copy of this software and associated documentation files (the "Software"),
>>>>> + * to deal in the Software without restriction, including without limitation
>>>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>>>> + * Software is furnished to do so, subject to the following conditions:
>>>>> + *
>>>>> + * The above copyright notice and this permission notice (including the next
>>>>> + * paragraph) shall be included in all copies or substantial portions of the
>>>>> + * Software.
>>>>> + *
>>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>>>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>>>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>>>>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
>>>>> + * IN THE SOFTWARE.
>>>>> + */
>>>>> +
>>>>> +#ifndef BRW_FS_BUILDER_H
>>>>> +#define BRW_FS_BUILDER_H
>>>>> +
>>>>> +#include "brw_ir_fs.h"
>>>>> +#include "brw_shader.h"
>>>>> +#include "brw_context.h"
>>>>> +
>>>>> +namespace brw {
>>>>> + /**
>>>>> + * Toolbox to assemble an FS IR program out of individual instructions.
>>>>> + *
>>>>> + * This object is meant to have an interface consistent with
>>>>> + * brw::vec4_builder. They cannot be fully interchangeable because
>>>>> + * brw::fs_builder generates scalar code while brw::vec4_builder generates
>>>>> + * vector code.
>>>>> + */
>>>>> + class fs_builder {
>>>>> + public:
>>>>> + /** Type used in this IR to represent a source of an instruction. */
>>>>> + typedef fs_reg src_reg;
>>>>> +
>>>>> + /** Type used in this IR to represent the destination of an instruction. */
>>>>> + typedef fs_reg dst_reg;
>>>>> +
>>>>> + /** Type used in this IR to represent an instruction. */
>>>>> + typedef fs_inst instruction;
>>>>> +
>>>>> + /**
>>>>> + * Construct an fs_builder appending instructions at the end of
>>>>> + * the instruction list of \p shader. \p dispatch_width gives
>>>>> + * the native execution width of the program.
>>>>> + */
>>>>> + fs_builder(backend_shader *shader,
>>>>> + unsigned dispatch_width) :
>>>>> + shader(shader), block(NULL),
>>>>> + cursor((exec_node *)&shader->instructions.tail),
>>>>> + _dispatch_width(dispatch_width),
>>>>> + _group(0),
>>>>> + force_writemask_all(false),
>>>>> + annotation()
>>>>> + {
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Construct an fs_builder that inserts instructions before \p cursor in
>>>>> + * basic block \p block, inheriting other code generation parameters
>>>>> + * from this.
>>>>> + */
>>>>> + fs_builder
>>>>> + at(bblock_t *block, exec_node *cursor) const
>>>>> + {
>>>>> + fs_builder bld = *this;
>>>>> + bld.block = block;
>>>>> + bld.cursor = cursor;
>>>>> + return bld;
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Construct a builder specifying the default SIMD width and group of
>>>>> + * channel enable signals, inheriting other code generation parameters
>>>>> + * from this.
>>>>> + *
>>>>> + * \p n gives the default SIMD width, \p i gives the slot group used for
>>>>> + * predication and control flow masking in multiples of \p n channels.
>>>>> + */
>>>>> + fs_builder
>>>>> + group(unsigned n, unsigned i) const
>>>>> + {
>>>>> + assert(n <= dispatch_width() &&
>>>>> + i < dispatch_width() / n);
>>>>> + fs_builder bld = *this;
>>>>> + bld._dispatch_width = n;
>>>>> + bld._group += i * n;
>>>>> + return bld;
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Alias for group() with width equal to eight.
>>>>> + */
>>>>> + fs_builder
>>>>> + half(unsigned i) const
>>>>> + {
>>>>> + return group(8, i);
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Construct a builder with per-channel control flow execution masking
>>>>> + * disabled if \p b is true. If control flow execution masking is
>>>>> + * already disabled this has no effect.
>>>>> + */
>>>>> + fs_builder
>>>>> + exec_all(bool b = true) const
>>>>> + {
>>>>> + fs_builder bld = *this;
>>>>> + if (b)
>>>>> + bld.force_writemask_all = true;
>>>>> + return bld;
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Construct a builder with the given debug annotation info.
>>>>> + */
>>>>> + fs_builder
>>>>> + annotate(const char *str, const void *ir = NULL) const
>>>>> + {
>>>>> + fs_builder bld = *this;
>>>>> + bld.annotation.str = str;
>>>>> + bld.annotation.ir = ir;
>>>>> + return bld;
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Get the SIMD width in use.
>>>>> + */
>>>>> + unsigned
>>>>> + dispatch_width() const
>>>>> + {
>>>>> + return _dispatch_width;
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Allocate a virtual register of natural vector size (one for this IR)
>>>>> + * and SIMD width. \p n gives the amount of space to allocate in
>>>>> + * dispatch_width units (which is just enough space for one logical
>>>>> + * component in this IR).
>>>>> + */
>>>>> + dst_reg
>>>>> + vgrf(enum brw_reg_type type, unsigned n = 1) const
>>>>> + {
>>>>> + return dst_reg(GRF, shader->alloc.allocate(
>>>>> + DIV_ROUND_UP(n * type_sz(type) * dispatch_width(),
>>>>> + REG_SIZE)),
>>>>> + type, dispatch_width());
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Create a null register of floating type.
>>>>> + */
>>>>> + dst_reg
>>>>> + null_reg_f() const
>>>>> + {
>>>>> + return dst_reg(retype(brw_null_vec(dispatch_width()),
>>>>> + BRW_REGISTER_TYPE_F));
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Create a null register of signed integer type.
>>>>> + */
>>>>> + dst_reg
>>>>> + null_reg_d() const
>>>>> + {
>>>>> + return dst_reg(retype(brw_null_vec(dispatch_width()),
>>>>> + BRW_REGISTER_TYPE_D));
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Create a null register of unsigned integer type.
>>>>> + */
>>>>> + dst_reg
>>>>> + null_reg_ud() const
>>>>> + {
>>>>> + return dst_reg(retype(brw_null_vec(dispatch_width()),
>>>>> + BRW_REGISTER_TYPE_UD));
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Get the mask of SIMD channels enabled by dispatch and not yet
>>>>> + * disabled by discard.
>>>>> + */
>>>>> + src_reg
>>>>> + sample_mask_reg() const
>>>>> + {
>>>>> + const bool uses_kill =
>>>>> + (shader->stage == MESA_SHADER_FRAGMENT &&
>>>>> + ((brw_wm_prog_data *)shader->stage_prog_data)->uses_kill);
>>>>> + return (shader->stage != MESA_SHADER_FRAGMENT ? src_reg(0xffff) :
>>>>> + uses_kill ? brw_flag_reg(0, 1) :
>>>>> + retype(brw_vec1_grf(1, 7), BRW_REGISTER_TYPE_UD));
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Insert an instruction into the program.
>>>>> + */
>>>>> + instruction *
>>>>> + emit(const instruction &inst) const
>>>>> + {
>>>>> + return emit(new(shader->mem_ctx) instruction(inst));
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Create and insert a nullary control instruction into the program.
>>>>> + */
>>>>> + instruction *
>>>>> + emit(enum opcode opcode) const
>>>>> + {
>>>>> + return emit(instruction(opcode, dispatch_width()));
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Create and insert a nullary instruction into the program.
>>>>> + */
>>>>> + instruction *
>>>>> + emit(enum opcode opcode, const dst_reg &dst) const
>>>>> + {
>>>>> + return emit(instruction(opcode, dst));
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Create and insert a unary instruction into the program.
>>>>> + */
>>>>> + instruction *
>>>>> + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0) const
>>>>> + {
>>>>> + switch (opcode) {
>>>>> + case SHADER_OPCODE_RCP:
>>>>> + case SHADER_OPCODE_RSQ:
>>>>> + case SHADER_OPCODE_SQRT:
>>>>> + case SHADER_OPCODE_EXP2:
>>>>> + case SHADER_OPCODE_LOG2:
>>>>> + case SHADER_OPCODE_SIN:
>>>>> + case SHADER_OPCODE_COS:
>>>>> + return fix_math_instruction(
>>>>> + emit(instruction(opcode, dst.width, dst,
>>>>> + fix_math_operand(src0))));
>>>>> +
>>>>> + default:
>>>>> + return emit(instruction(opcode, dst.width, dst, src0));
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Create and insert a binary instruction into the program.
>>>>> + */
>>>>> + instruction *
>>>>> + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0,
>>>>> + const src_reg &src1) const
>>>>> + {
>>>>> + switch (opcode) {
>>>>> + case SHADER_OPCODE_POW:
>>>>> + case SHADER_OPCODE_INT_QUOTIENT:
>>>>> + case SHADER_OPCODE_INT_REMAINDER:
>>>>> + return fix_math_instruction(
>>>>> + emit(instruction(opcode, dst.width, dst,
>>>>> + fix_math_operand(src0),
>>>>> + fix_math_operand(src1))));
>>>>> +
>>>>> + default:
>>>>> + return emit(instruction(opcode, dst.width, dst, src0, src1));
>>>>> +
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Create and insert a ternary instruction into the program.
>>>>> + */
>>>>> + instruction *
>>>>> + emit(enum opcode opcode, const dst_reg &dst, const src_reg &src0,
>>>>> + const src_reg &src1, const src_reg &src2) const
>>>>> + {
>>>>> + switch (opcode) {
>>>>> + case BRW_OPCODE_BFE:
>>>>> + case BRW_OPCODE_BFI2:
>>>>> + case BRW_OPCODE_MAD:
>>>>> + case BRW_OPCODE_LRP:
>>>>> + return emit(instruction(opcode, dst.width, dst,
>>>>> + fix_3src_operand(src0),
>>>>> + fix_3src_operand(src1),
>>>>> + fix_3src_operand(src2)));
>>>>> +
>>>>> + default:
>>>>> + return emit(instruction(opcode, dst.width, dst, src0, src1, src2));
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Insert a preallocated instruction into the program.
>>>>> + */
>>>>> + instruction *
>>>>> + emit(instruction *inst) const
>>>>> + {
>>>>> + assert(inst->exec_size == dispatch_width() ||
>>>>> + force_writemask_all);
>>>>> + assert(_group == 0 || _group == 8);
>>>>> +
>>>>> + inst->force_sechalf = (_group == 8);
>>>>> + inst->force_writemask_all = force_writemask_all;
>>>>> + inst->annotation = annotation.str;
>>>>> + inst->ir = annotation.ir;
>>>>> +
>>>>> + if (block)
>>>>> + static_cast<instruction *>(cursor)->insert_before(block, inst);
>>>>> + else
>>>>> + cursor->insert_before(inst);
>>>>> +
>>>>> + return inst;
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Select \p src0 if the comparison of both sources with the given
>>>>> + * conditional mod evaluates to true, otherwise select \p src1.
>>>>> + *
>>>>> + * Generally useful to get the minimum or maximum of two values.
>>>>> + */
>>>>> + void
>>>>> + emit_minmax(const dst_reg &dst, const src_reg &src0,
>>>>> + const src_reg &src1, brw_conditional_mod mod) const
>>>>
>>>> This is unused, after Jason's commit:
>>>>
>>>> commit 78644ffc4d341deb431145108f0b2d377e59b61e
>>>> Author: Jason Ekstrand <jason.ekstrand at intel.com>
>>>> Date: Wed May 20 10:35:34 2015 -0700
>>>>
>>>> i965/fs: Remove the ir_visitor code
>>>>
>>>> Now that everything is running through NIR, this is all dead.
>>>>
>>>> Reviewed-by: Matt Turner <mattst88 at gmail.com>
>>>> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
>>>>
>>>
>>> I was planning to use it in my ARB_shader_image_load_store series. It
>>> may also make sense to fix brw_fs_nir.cpp to use it, as it would
>>> probably simplify the code slightly. I can make the changes as part of
>>> PATCH 32 if you like.
>>
>> Oh, okay. I didn't realize that brw_fs_nir.cpp had it open-coded.
>> Yeah, it's fine to leave it in here.
>>
>>>
>>>>> + {
>>>>> + if (shader->devinfo->gen >= 6) {
>>>>> + set_condmod(mod, SEL(dst, fix_unsigned_negate(src0),
>>>>> + fix_unsigned_negate(src1)));
>>>>> + } else {
>>>>> + CMP(null_reg_d(), src0, src1, mod);
>>>>> + set_predicate(BRW_PREDICATE_NORMAL,
>>>>> + SEL(dst, src0, src1));
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Copy any live channel from \p src to the first channel of \p dst.
>>>>> + */
>>>>> + void
>>>>> + emit_uniformize(const dst_reg &dst, const src_reg &src) const
>>>>> + {
>>>>> + const fs_builder ubld = exec_all();
>>>>> + const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD);
>>>>> +
>>>>> + ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, component(chan_index, 0));
>>>>> + ubld.emit(SHADER_OPCODE_BROADCAST, component(dst, 0),
>>>>> + src, component(chan_index, 0));
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Assorted arithmetic ops.
>>>>> + * @{
>>>>> + */
>>>>> +#define ALU1(op) \
>>>>> + instruction * \
>>>>> + op(const dst_reg &dst, const src_reg &src0) const \
>>>>> + { \
>>>>> + return emit(BRW_OPCODE_##op, dst, src0); \
>>>>> + }
>>>>> +
>>>>> +#define ALU2(op) \
>>>>> + instruction * \
>>>>> + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1) const \
>>>>> + { \
>>>>> + return emit(BRW_OPCODE_##op, dst, src0, src1); \
>>>>> + }
>>>>> +
>>>>> +#define ALU2_ACC(op) \
>>>>> + instruction * \
>>>>> + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1) const \
>>>>> + { \
>>>>> + instruction *inst = emit(BRW_OPCODE_##op, dst, src0, src1); \
>>>>> + inst->writes_accumulator = true; \
>>>>> + return inst; \
>>>>> + }
>>>>> +
>>>>> +#define ALU3(op) \
>>>>> + instruction * \
>>>>> + op(const dst_reg &dst, const src_reg &src0, const src_reg &src1, \
>>>>> + const src_reg &src2) const \
>>>>> + { \
>>>>> + return emit(BRW_OPCODE_##op, dst, src0, src1, src2); \
>>>>> + }
>>>>> +
>>>>> + ALU2(ADD)
>>>>> + ALU2_ACC(ADDC)
>>>>> + ALU2(AND)
>>>>> + ALU2(ASR)
>>>>> + ALU2(AVG)
>>>>> + ALU3(BFE)
>>>>> + ALU2(BFI1)
>>>>> + ALU3(BFI2)
>>>>> + ALU1(BFREV)
>>>>> + ALU1(CBIT)
>>>>> + ALU2(CMPN)
>>>>> + ALU3(CSEL)
>>>>> + ALU2(DP2)
>>>>> + ALU2(DP3)
>>>>> + ALU2(DP4)
>>>>> + ALU2(DPH)
>>>>> + ALU1(F16TO32)
>>>>> + ALU1(F32TO16)
>>>>> + ALU1(FBH)
>>>>> + ALU1(FBL)
>>>>> + ALU1(FRC)
>>>>> + ALU2(LINE)
>>>>> + ALU1(LZD)
>>>>> + ALU2(MAC)
>>>>> + ALU2_ACC(MACH)
>>>>> + ALU3(MAD)
>>>>> + ALU1(MOV)
>>>>> + ALU2(MUL)
>>>>> + ALU1(NOT)
>>>>> + ALU2(OR)
>>>>> + ALU2(PLN)
>>>>> + ALU1(RNDD)
>>>>> + ALU1(RNDE)
>>>>> + ALU1(RNDU)
>>>>> + ALU1(RNDZ)
>>>>> + ALU2(SAD2)
>>>>> + ALU2_ACC(SADA2)
>>>>> + ALU2(SEL)
>>>>> + ALU2(SHL)
>>>>> + ALU2(SHR)
>>>>> + ALU2_ACC(SUBB)
>>>>> + ALU2(XOR)
>>>>> +
>>>>> +#undef ALU3
>>>>> +#undef ALU2_ACC
>>>>> +#undef ALU2
>>>>> +#undef ALU1
>>>>> + /** @} */
>>>>> +
>>>>> + /**
>>>>> + * CMP: Sets the low bit of the destination channels with the result
>>>>> + * of the comparison, while the upper bits are undefined, and updates
>>>>> + * the flag register with the packed 16 bits of the result.
>>>>> + */
>>>>> + instruction *
>>>>> + CMP(const dst_reg &dst, const src_reg &src0, const src_reg &src1,
>>>>> + brw_conditional_mod condition) const
>>>>> + {
>>>>> + /* Take the instruction:
>>>>> + *
>>>>> + * CMP null<d> src0<f> src1<f>
>>>>> + *
>>>>> + * Original gen4 does type conversion to the destination type
>>>>> + * before comparison, producing garbage results for floating
>>>>> + * point comparisons.
>>>>> + *
>>>>> + * The destination type doesn't matter on newer generations,
>>>>> + * so we set the type to match src0 so we can compact the
>>>>> + * instruction.
>>>>> + */
>>>>> + return set_condmod(condition,
>>>>> + emit(BRW_OPCODE_CMP, retype(dst, src0.type),
>>>>> + fix_unsigned_negate(src0),
>>>>> + fix_unsigned_negate(src1)));
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Gen4 predicated IF.
>>>>> + */
>>>>> + instruction *
>>>>> + IF(brw_predicate predicate) const
>>>>> + {
>>>>> + instruction *inst = emit(BRW_OPCODE_IF);
>>>>> + return set_predicate(predicate, inst);
>>>>> + }
>>>>> +
>>>>> + /**
>>>>> + * Gen6 IF with embedded comparison.
>>>>> + */
>>>>> + instruction *
>>>>> + IF(const src_reg &src0, const src_reg &src1,
>>>>> + brw_conditional_mod condition) const
>>>>> + {
>>>>> + assert(shader->devinfo->gen == 6);
>>>>> + return set_condmod(condition,
>>>>> + emit(BRW_OPCODE_IF,
>>>>> + null_reg_d(),
>>>>> + fix_unsigned_negate(src0),
>>>>> + fix_unsigned_negate(src1)));
>>>>
>>>> I don't see that we were calling resolve_ud_negate() on
>>>> fs_visitor::emit_if_gen6.
>>>>
>>> Right, but shouldn't any instruction comparing unsigned values which are
>>> potentially negated call resolve_ud_negate()?
>>
>> From looking at the Sandybridge PRM, the IF with embedded conditional
>> cannot do source modifiers, so I think the answer is no.
>>
> I suspect that the PRM is wrong, IIRC I checked this on real hardware
> and the simulator, and the embedded conditional on SNB behaves largely
> like the normal CMP instruction, source modifiers work as usual (though
> we didn't actually make use of them AFAIK).
All the more reason that this should be done as a separate patch with
an explanation.
>>>> In fact, we've removed the function entirely (a known regression in
>>>> the switch to NIR). I'd drop this for now. It's unused as well.
>>>>
>>> If there are still plans to restore this functionality, I would rather
>>> leave it alone rather than remove it now for somebody else to have to
>>> rewrite it later on, because it's less work and doesn't seem to hurt.
>>
>> But moreover, again I don't like hidden changes like this in what is
>> sold as a refactor.
>>
> It was never my intention to sell this as a refactor.
Are you just arguing semantics?
Call it what you will -- the series is refactoring a bunch of code.
>> Even if it had been a bug in the original code, this patch would *not*
>> have been the appropriate place to fix it.
>>
> Heh, OK, next time I'm rewriting code I'll try to pay more attention to
> preserve the bugs I notice in the original code.
Drop the snark. I shouldn't have to explain to you that huge
copy-and-paste blocks shouldn't have hidden functional changes.
More information about the mesa-dev
mailing list