[Mesa-dev] [PATCHv2 07/10] intel/fs: Introduce regioning lowering pass.
Iago Toral
itoral at igalia.com
Tue Jan 8 07:25:06 UTC 2019
On Mon, 2019-01-07 at 12:02 -0800, Francisco Jerez wrote:
> Iago Toral <itoral at igalia.com> writes:
>
> > On Sat, 2019-01-05 at 14:03 -0800, Francisco Jerez wrote:
> > > This legalization pass is meant to handle situations where the
> > > source
> > > or destination regioning controls of an instruction are
> > > unsupported
> > > by
> > > the hardware and need to be lowered away into separate
> > > instructions.
> > > This should be more reliable and future-proof than the current
> > > approach of handling CHV/BXT restrictions manually all over the
> > > visitor. The same mechanism is leveraged to lower unsupported
> > > type
> > > conversions easily, which obsoletes the lower_conversions pass.
> > >
> > > v2: Give conditional modifiers the same treatment as predicates
> > > for
> > > SEL instructions in lower_dst_modifiers() (Iago). Special-
> > > case a
> > > couple of other instructions with inconsistent conditional
> > > mod
> > > semantics in lower_dst_modifiers() (Curro).
> > > ---
> > > src/intel/Makefile.sources | 1 +
> > > src/intel/compiler/brw_fs.cpp | 5 +-
> > > src/intel/compiler/brw_fs.h | 21 +-
> > > src/intel/compiler/brw_fs_lower_regioning.cpp | 399
> > > ++++++++++++++++++
> > > src/intel/compiler/brw_ir_fs.h | 10 +
> > > src/intel/compiler/meson.build | 1 +
> > > 6 files changed, 418 insertions(+), 19 deletions(-)
> > > create mode 100644 src/intel/compiler/brw_fs_lower_regioning.cpp
> > >
> > > diff --git a/src/intel/Makefile.sources
> > > b/src/intel/Makefile.sources
> > > index 5e7d32293b7..6b9874d2b80 100644
> > > --- a/src/intel/Makefile.sources
> > > +++ b/src/intel/Makefile.sources
> > > @@ -64,6 +64,7 @@ COMPILER_FILES = \
> > > compiler/brw_fs_live_variables.h \
> > > compiler/brw_fs_lower_conversions.cpp \
> > > compiler/brw_fs_lower_pack.cpp \
> > > + compiler/brw_fs_lower_regioning.cpp \
> > > compiler/brw_fs_nir.cpp \
> > > compiler/brw_fs_reg_allocate.cpp \
> > > compiler/brw_fs_register_coalesce.cpp \
> > > diff --git a/src/intel/compiler/brw_fs.cpp
> > > b/src/intel/compiler/brw_fs.cpp
> > > index 889509badab..caa7a798332 100644
> > > --- a/src/intel/compiler/brw_fs.cpp
> > > +++ b/src/intel/compiler/brw_fs.cpp
> > > @@ -6471,7 +6471,10 @@ fs_visitor::optimize()
> > > OPT(dead_code_eliminate);
> > > }
> > >
> > > - if (OPT(lower_conversions)) {
> > > + progress = false;
> > > + OPT(lower_conversions);
> > > + OPT(lower_regioning);
> > > + if (progress) {
> >
> > This is a small nitpick but since this makes lower_conversions
> > redundant, maybe it makes more sense to just remove the call to it
> > here
> > already in this patch so you don't have to reset the progress
> > variable
> > and simply do:
> >
> > if (OPT(lower_regioning)) {
> > ...
> > }
> >
>
> The main reason for this is that in the event of a regression this
> will
> allow identifying from the bisection result whether the reason for
> the
> failure is the lack of a condition in the lower_regioning pass which
> was
> previously handled by lower_conversions, or whether it's a bug in the
> lowering code of lower_regioning itself.
That's actually a good idea.
> > > OPT(opt_copy_propagation);
> > > OPT(dead_code_eliminate);
> > > OPT(lower_simd_width);
> > > diff --git a/src/intel/compiler/brw_fs.h
> > > b/src/intel/compiler/brw_fs.h
> > > index dc36ecc21ac..36825754931 100644
> > > --- a/src/intel/compiler/brw_fs.h
> > > +++ b/src/intel/compiler/brw_fs.h
> > > @@ -164,6 +164,7 @@ public:
> > > void lower_uniform_pull_constant_loads();
> > > bool lower_load_payload();
> > > bool lower_pack();
> > > + bool lower_regioning();
> > > bool lower_conversions();
> > > bool lower_logical_sends();
> > > bool lower_integer_multiplication();
> > > @@ -536,24 +537,8 @@ namespace brw {
> > > }
> > > }
> > >
> > > - /**
> > > - * Remove any modifiers from the \p i-th source region of the
> > > instruction,
> > > - * including negate, abs and any implicit type conversion to
> > > the
> > > execution
> > > - * type. Instead any source modifiers will be implemented as
> > > a
> > > separate
> > > - * MOV instruction prior to the original instruction.
> > > - */
> > > - inline bool
> > > - lower_src_modifiers(fs_visitor *v, bblock_t *block, fs_inst
> > > *inst, unsigned i)
> > > - {
> > > - assert(inst->components_read(i) == 1);
> > > - const fs_builder ibld(v, block, inst);
> > > - const fs_reg tmp = ibld.vgrf(get_exec_type(inst));
> > > -
> > > - ibld.MOV(tmp, inst->src[i]);
> > > - inst->src[i] = tmp;
> > > -
> > > - return true;
> > > - }
> > > + bool
> > > + lower_src_modifiers(fs_visitor *v, bblock_t *block, fs_inst
> > > *inst, unsigned i);
> > > }
> > >
> > > void shuffle_from_32bit_read(const brw::fs_builder &bld,
> > > diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp
> > > b/src/intel/compiler/brw_fs_lower_regioning.cpp
> > > new file mode 100644
> > > index 00000000000..d7c97e1442a
> > > --- /dev/null
> > > +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
> > > @@ -0,0 +1,399 @@
> > > +/*
> > > + * Copyright © 2018 Intel Corporation
> > > + *
> > > + * Permission is hereby granted, free of charge, to any person
> > > obtaining a
> > > + * copy of this software and associated documentation files (the
> > > "Software"),
> > > + * to deal in the Software without restriction, including
> > > without
> > > limitation
> > > + * the rights to use, copy, modify, merge, publish, distribute,
> > > sublicense,
> > > + * and/or sell copies of the Software, and to permit persons to
> > > whom
> > > the
> > > + * Software is furnished to do so, subject to the following
> > > conditions:
> > > + *
> > > + * The above copyright notice and this permission notice
> > > (including
> > > the next
> > > + * paragraph) shall be included in all copies or substantial
> > > portions of the
> > > + * Software.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
> > > KIND,
> > > EXPRESS OR
> > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > > MERCHANTABILITY,
> > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO
> > > EVENT SHALL
> > > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
> > > DAMAGES
> > > OR OTHER
> > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> > > OTHERWISE,
> > > ARISING
> > > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> > > OTHER DEALINGS
> > > + * IN THE SOFTWARE.
> > > + */
> > > +
> > > +#include "brw_fs.h"
> > > +#include "brw_cfg.h"
> > > +#include "brw_fs_builder.h"
> > > +
> > > +using namespace brw;
> > > +
> > > +namespace {
> > > + /* From the SKL PRM Vol 2a, "Move":
> > > + *
> > > + * "A mov with the same source and destination type, no
> > > source
> > > modifier,
> > > + * and no saturation is a raw move. A packed byte
> > > destination
> > > region (B
> > > + * or UB type with HorzStride == 1 and ExecSize > 1) can
> > > only be
> > > written
> > > + * using raw move."
> > > + */
> > > + bool
> > > + is_byte_raw_mov(const fs_inst *inst)
> > > + {
> > > + return type_sz(inst->dst.type) == 1 &&
> > > + inst->opcode == BRW_OPCODE_MOV &&
> > > + inst->src[0].type == inst->dst.type &&
> > > + !inst->saturate &&
> > > + !inst->src[0].negate &&
> > > + !inst->src[0].abs;
> > > + }
> > > +
> > > + /*
> > > + * Return an acceptable byte stride for the destination of an
> > > instruction
> > > + * that requires it to have some particular alignment.
> > > + */
> > > + unsigned
> > > + required_dst_byte_stride(const fs_inst *inst)
> > > + {
> > > + if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
> > > + !is_byte_raw_mov(inst)) {
> > > + return get_exec_type_size(inst);
> > > + } else {
> > > + unsigned stride = inst->dst.stride * type_sz(inst-
> > > > dst.type);
> > >
> > > +
> > > + for (unsigned i = 0; i < inst->sources; i++) {
> > > + if (!is_uniform(inst->src[i]))
> > > + stride = MAX2(stride, inst->src[i].stride *
> > > + type_sz(inst->src[i].type));
> > > + }
> > > +
> > > + return stride;
> > > + }
> > > + }
> > > +
> > > + /*
> > > + * Return an acceptable byte sub-register offset for the
> > > destination of an
> > > + * instruction that requires it to be aligned to the sub-
> > > register
> > > offset of
> > > + * the sources.
> > > + */
> > > + unsigned
> > > + required_dst_byte_offset(const fs_inst *inst)
> > > + {
> > > + for (unsigned i = 0; i < inst->sources; i++) {
> > > + if (!is_uniform(inst->src[i]))
> > > + if (reg_offset(inst->src[i]) % REG_SIZE !=
> > > + reg_offset(inst->dst) % REG_SIZE)
> > > + return 0;
> > > + }
> > > +
> > > + return reg_offset(inst->dst) % REG_SIZE;
> > > + }
> > > +
> > > + /*
> > > + * Return whether the instruction has an unsupported channel
> > > bit
> > > layout
> > > + * specified for the i-th source region.
> > > + */
> > > + bool
> > > + has_invalid_src_region(const gen_device_info *devinfo, const
> > > fs_inst *inst,
> > > + unsigned i)
> > > + {
> > > + if (is_unordered(inst)) {
> > > + return false;
> > > + } else {
> > > + const unsigned dst_byte_stride = inst->dst.stride *
> > > type_sz(inst->dst.type);
> > > + const unsigned src_byte_stride = inst->src[i].stride *
> > > + type_sz(inst->src[i].type);
> > > + const unsigned dst_byte_offset = reg_offset(inst->dst)
> > > %
> > > REG_SIZE;
> > > + const unsigned src_byte_offset = reg_offset(inst-
> > > >src[i]) %
> > > REG_SIZE;
> > > +
> > > + return has_dst_aligned_region_restriction(devinfo,
> > > inst) &&
> > > + !is_uniform(inst->src[i]) &&
> > > + (src_byte_stride != dst_byte_stride ||
> > > + src_byte_offset != dst_byte_offset);
> > > + }
> > > + }
> > > +
> > > + /*
> > > + * Return whether the instruction has an unsupported channel
> > > bit
> > > layout
> > > + * specified for the destination region.
> > > + */
> > > + bool
> > > + has_invalid_dst_region(const gen_device_info *devinfo,
> > > + const fs_inst *inst)
> > > + {
> > > + if (is_unordered(inst)) {
> > > + return false;
> > > + } else {
> > > + const brw_reg_type exec_type = get_exec_type(inst);
> > > + const unsigned dst_byte_offset = reg_offset(inst->dst)
> > > %
> > > REG_SIZE;
> > > + const unsigned dst_byte_stride = inst->dst.stride *
> > > type_sz(inst->dst.type);
> > > + const bool is_narrowing_conversion =
> > > !is_byte_raw_mov(inst)
> > > &&
> > > + type_sz(inst->dst.type) < type_sz(exec_type);
> > > +
> > > + return (has_dst_aligned_region_restriction(devinfo,
> > > inst)
> > > &&
> > > + (required_dst_byte_stride(inst) !=
> > > dst_byte_stride
> > > > >
> > >
> > > + required_dst_byte_offset(inst) !=
> > > dst_byte_offset)) ||
> > > + (is_narrowing_conversion &&
> > > + required_dst_byte_stride(inst) !=
> > > dst_byte_stride);
> > > + }
> > > + }
> > > +
> > > + /*
> > > + * Return whether the instruction has unsupported source
> > > modifiers
> > > + * specified for the i-th source region.
> > > + */
> > > + bool
> > > + has_invalid_src_modifiers(const gen_device_info *devinfo,
> > > const
> > > fs_inst *inst,
> > > + unsigned i)
> > > + {
> > > + return !inst->can_do_source_mods(devinfo) &&
> > > + (inst->src[i].negate || inst->src[i].abs);
> > > + }
> > > +
> > > + /*
> > > + * Return whether the instruction has an unsupported type
> > > conversion
> > > + * specified for the destination.
> > > + */
> > > + bool
> > > + has_invalid_conversion(const gen_device_info *devinfo, const
> > > fs_inst *inst)
> > > + {
> > > + switch (inst->opcode) {
> > > + case BRW_OPCODE_MOV:
> > > + return false;
> > > + case BRW_OPCODE_SEL:
> > > + return inst->dst.type != get_exec_type(inst);
> > > + case SHADER_OPCODE_BROADCAST:
> > > + case SHADER_OPCODE_MOV_INDIRECT:
> > > + /* The source and destination types of these may be
> > > hard-
> > > coded to
> > > + * integer at codegen time due to hardware limitations
> > > of
> > > 64-bit
> > > + * types.
> > > + */
> > > + return ((devinfo->gen == 7 && !devinfo->is_haswell) ||
> > > + devinfo->is_cherryview ||
> > > gen_device_info_is_9lp(devinfo)) &&
> > > + type_sz(inst->src[0].type) > 4 &&
> > > + inst->dst.type != inst->src[0].type;
> > > + default:
> > > + /* FIXME: We assume the opcodes don't explicitly
> > > mentioned
> > > before
> > > + * just work fine with arbitrary conversions.
> > > + */
> > > + return false;
> > > + }
> > > + }
> > > +
> > > + bool
> > > + lower_instruction(fs_visitor *v, bblock_t *block, fs_inst
> > > *inst);
> > > +}
> > > +
> > > +namespace brw {
> >
> > Maybe move this namespace to the top of the file? That way we can
> > have
> > a single anonymous namespace definition for what comes before and
> > after
> > this.
> >
>
> The reason I put it here was to have all the lower_src_*/lower_dst_*
> helpers close together for readability. It won't really save any
> code
> to move lower_src_modifiers() to the top because its implementation
> relies on lower_instruction(), which is part of the anonymous
> namespace,
> so I'll have to open and close it before anyway.
Right, let's keep it here then.
> > > + /**
> > > + * Remove any modifiers from the \p i-th source region of the
> > > instruction,
> > > + * including negate, abs and any implicit type conversion to
> > > the
> > > execution
> > > + * type. Instead any source modifiers will be implemented as
> > > a
> > > separate
> > > + * MOV instruction prior to the original instruction.
> > > + */
> > > + bool
> > > + lower_src_modifiers(fs_visitor *v, bblock_t *block, fs_inst
> > > *inst, unsigned i)
> > > + {
> > > + assert(inst->components_read(i) == 1);
> > > + const fs_builder ibld(v, block, inst);
> > > + const fs_reg tmp = ibld.vgrf(get_exec_type(inst));
> > > +
> > > + lower_instruction(v, block, ibld.MOV(tmp, inst->src[i]));
> > > + inst->src[i] = tmp;
> > > +
> > > + return true;
> > > + }
> > > +}
> > > +
> > > +namespace {
> > > + /**
> > > + * Return whether the instruction has non-standard semantics
> > > for
> > > the
> > > + * conditional mod which don't cause the flag register to be
> > > updated with
> > > + * the comparison result.
> > > + */
> > > + bool
> > > + has_inconsistent_cmod(const fs_inst *inst)
> > > + {
> > > + return inst->opcode == BRW_OPCODE_SEL ||
> > > + inst->opcode == BRW_OPCODE_CSEL ||
> > > + inst->opcode == BRW_OPCODE_IF ||
> > > + inst->opcode == BRW_OPCODE_WHILE;
> > > + }
> > > +
> > > + /**
> > > + * Remove any modifiers from the destination region of the
> > > instruction,
> > > + * including saturate, conditional mod and any implicit type
> > > conversion
> > > + * from the execution type. Instead any destination
> > > modifiers
> > > will be
> > > + * implemented as a separate MOV instruction after the
> > > original
> > > + * instruction.
> > > + */
> > > + bool
> > > + lower_dst_modifiers(fs_visitor *v, bblock_t *block, fs_inst
> > > *inst)
> > > + {
> > > + const fs_builder ibld(v, block, inst);
> > > + const brw_reg_type type = get_exec_type(inst);
> > > + /* Not strictly necessary, but if possible use a temporary
> > > with the same
> > > + * channel alignment as the current destination in order
> > > to
> > > avoid
> > > + * violating the restrictions enforced later on by
> > > lower_src_region()
> > > + * and lower_dst_region(), which would introduce
> > > additional
> > > copy
> > > + * instructions into the program unnecessarily.
> > > + */
> > > + const unsigned stride =
> > > + type_sz(inst->dst.type) * inst->dst.stride <=
> > > type_sz(type)
> > > ? 1 :
> > > + type_sz(inst->dst.type) * inst->dst.stride /
> > > type_sz(type);
> > > + const fs_reg tmp = horiz_stride(ibld.vgrf(type, stride),
> > > stride);
> > > +
> > > + /* Emit a MOV taking care of all the destination
> > > modifiers. */
> > > + fs_inst *mov = ibld.at(block, inst->next).MOV(inst->dst,
> > > tmp);
> > > + mov->saturate = inst->saturate;
> > > + if (!has_inconsistent_cmod(inst))
> > > + mov->conditional_mod = inst->conditional_mod;
> > > + if (inst->opcode != BRW_OPCODE_SEL) {
> > > + mov->predicate = inst->predicate;
> > > + mov->predicate_inverse = inst->predicate_inverse;
> > > + }
> > > + mov->flag_subreg = inst->flag_subreg;
> >
> > Shouldn't we also copy the force_writemask_all flag?
> >
>
> No need, fs_builder(v, block, inst) gives you a builder pre-
> initialized
> with execution controls matching the original instruction, including
> force_writemask_all and the channel group.
Right.
> > > + lower_instruction(v, block, mov);
> > > +
> > > + /* Point the original instruction at the temporary, and
> > > clean
> > > up any
> > > + * destination modifiers.
> > > + */
> > > + assert(inst->size_written == inst-
> > > >dst.component_size(inst-
> > > > exec_size));
> > >
> > > + inst->dst = tmp;
> > > + inst->size_written = inst->dst.component_size(inst-
> > > > exec_size);
> > >
> > > + inst->saturate = false;
> > > + if (!has_inconsistent_cmod(inst))
> > > + inst->conditional_mod = BRW_CONDITIONAL_NONE;
> > > +
> > > + assert(!inst->flags_written() || !mov->predicate);
> > > + return true;
> > > + }
> > > +
> > > + /**
> > > + * Remove any non-trivial shuffling of data from the \p i-th
> > > source region
> > > + * of the instruction. Instead implement the region as a
> > > series
> > > of integer
> > > + * copies into a temporary with the same channel layout as
> > > the
> > > destination.
> > > + */
> > > + bool
> > > + lower_src_region(fs_visitor *v, bblock_t *block, fs_inst
> > > *inst,
> > > unsigned i)
> > > + {
> > > + assert(inst->components_read(i) == 1);
> > > + const fs_builder ibld(v, block, inst);
> > > + const unsigned stride = type_sz(inst->dst.type) * inst-
> > > > dst.stride /
> > >
> > > + type_sz(inst->src[i].type);
> > > + assert(stride > 0);
> > > + const fs_reg tmp = horiz_stride(ibld.vgrf(inst-
> > > >src[i].type,
> > > stride),
> > > + stride);
> > > +
> > > + /* Emit a series of 32-bit integer copies with any source
> > > modifiers
> > > + * cleaned up (because their semantics are dependent on
> > > the
> > > type).
> > > + */
> > > + const brw_reg_type raw_type =
> > > brw_int_type(MIN2(type_sz(tmp.type), 4),
> > > + false);
> > > + const unsigned n = type_sz(tmp.type) / type_sz(raw_type);
> > > + fs_reg raw_src = inst->src[i];
> > > + raw_src.negate = false;
> > > + raw_src.abs = false;
> > > +
> > > + for (unsigned j = 0; j < n; j++)
> > > + ibld.MOV(subscript(tmp, raw_type, j),
> > > subscript(raw_src,
> > > raw_type, j));
> > > +
> > > + /* Point the original instruction at the temporary, making
> > > sure to keep
> > > + * any source modifiers in the instruction.
> > > + */
> > > + fs_reg lower_src = tmp;
> > > + lower_src.negate = inst->src[i].negate;
> > > + lower_src.abs = inst->src[i].abs;
> > > + inst->src[i] = lower_src;
> > > +
> > > + return true;
> > > + }
> > > +
> > > + /**
> > > + * Remove any non-trivial shuffling of data from the
> > > destination
> > > region of
> > > + * the instruction. Instead implement the region as a series
> > > of
> > > integer
> > > + * copies from a temporary with a channel layout compatible
> > > with
> > > the
> > > + * sources.
> > > + */
> > > + bool
> > > + lower_dst_region(fs_visitor *v, bblock_t *block, fs_inst
> > > *inst)
> > > + {
> > > + const fs_builder ibld(v, block, inst);
> > > + const unsigned stride = required_dst_byte_stride(inst) /
> > > + type_sz(inst->dst.type);
> > > + assert(stride > 0);
> > > + const fs_reg tmp = horiz_stride(ibld.vgrf(inst->dst.type,
> > > stride),
> > > + stride);
> > > +
> > > + /* Emit a series of 32-bit integer copies from the
> > > temporary
> > > into the
> > > + * original destination.
> > > + */
> > > + const brw_reg_type raw_type =
> > > brw_int_type(MIN2(type_sz(tmp.type), 4),
> > > + false);
> > > + const unsigned n = type_sz(tmp.type) / type_sz(raw_type);
> > > +
> > > + if (inst->predicate && inst->opcode != BRW_OPCODE_SEL) {
> > > + /* Note that in general we cannot simply predicate the
> > > copies on the
> > > + * same flag register as the original instruction,
> > > since it
> > > may have
> > > + * been overwritten by the instruction itself. Instead
> > > initialize
> > > + * the temporary with the previous contents of the
> > > destination
> > > + * register.
> > > + */
> > > + for (unsigned j = 0; j < n; j++)
> > > + ibld.MOV(subscript(tmp, raw_type, j),
> > > + subscript(inst->dst, raw_type, j));
> > > + }
> > > +
> > > + for (unsigned j = 0; j < n; j++)
> > > + ibld.at(block, inst->next).MOV(subscript(inst->dst,
> > > raw_type, j),
> > > + subscript(tmp, raw_type,
> > > j));
> > > + /* Point the original instruction at the temporary, making
> > > sure to keep
> > > + * any destination modifiers in the instruction.
> > > + */
> > > + assert(inst->size_written == inst-
> > > >dst.component_size(inst-
> > > > exec_size));
> > >
> > > + inst->dst = tmp;
> > > + inst->size_written = inst->dst.component_size(inst-
> > > > exec_size);
> > >
> > > +
> > > + return true;
> > > + }
> > > +
> > > + /**
> > > + * Legalize the source and destination regioning controls of
> > > the
> > > specified
> > > + * instruction.
> > > + */
> > > + bool
> > > + lower_instruction(fs_visitor *v, bblock_t *block, fs_inst
> > > *inst)
> > > + {
> > > + const gen_device_info *devinfo = v->devinfo;
> > > + bool progress = false;
> > > +
> > > + if (has_invalid_conversion(devinfo, inst))
> > > + progress |= lower_dst_modifiers(v, block, inst);
> > > +
> > > + if (has_invalid_dst_region(devinfo, inst))
> > > + progress |= lower_dst_region(v, block, inst);
> > > +
> > > + for (unsigned i = 0; i < inst->sources; i++) {
> > > + if (has_invalid_src_modifiers(devinfo, inst, i))
> > > + progress |= lower_src_modifiers(v, block, inst, i);
> > > +
> > > + if (has_invalid_src_region(devinfo, inst, i))
> > > + progress |= lower_src_region(v, block, inst, i);
> > > + }
> > > +
> > > + return progress;
> > > + }
> > > +}
> > > +
> > > +bool
> > > +fs_visitor::lower_regioning()
> > > +{
> > > + bool progress = false;
> > > +
> > > + foreach_block_and_inst(block, fs_inst, inst, cfg)
> > > + progress |= lower_instruction(this, block, inst);
> > > +
> > > + if (progress)
> > > + invalidate_live_intervals();
> > > +
> > > + return progress;
> > > +}
> > > diff --git a/src/intel/compiler/brw_ir_fs.h
> > > b/src/intel/compiler/brw_ir_fs.h
> > > index 5bb92e4cc86..3c23fb375e4 100644
> > > --- a/src/intel/compiler/brw_ir_fs.h
> > > +++ b/src/intel/compiler/brw_ir_fs.h
> > > @@ -486,6 +486,16 @@ get_exec_type_size(const fs_inst *inst)
> > > return type_sz(get_exec_type(inst));
> > > }
> > >
> > > +/**
> > > + * Return whether the instruction isn't an ALU instruction and
> > > cannot be
> > > + * assumed to complete in-order.
> > > + */
> > > +static inline bool
> > > +is_unordered(const fs_inst *inst)
> > > +{
> > > + return inst->mlen || inst->is_send_from_grf() || inst-
> > > >is_math();
> > > +}
> > > +
> > > /**
> > > * Return whether the following regioning restriction applies to
> > > the
> > > specified
> > > * instruction. From the Cherryview PRM Vol 7. "Register Region
> > > diff --git a/src/intel/compiler/meson.build
> > > b/src/intel/compiler/meson.build
> > > index 69ce2eab4cf..4af134b418e 100644
> > > --- a/src/intel/compiler/meson.build
> > > +++ b/src/intel/compiler/meson.build
> > > @@ -57,6 +57,7 @@ libintel_compiler_files = files(
> > > 'brw_fs_live_variables.h',
> > > 'brw_fs_lower_conversions.cpp',
> > > 'brw_fs_lower_pack.cpp',
> > > + 'brw_fs_lower_regioning.cpp',
> > > 'brw_fs_nir.cpp',
> > > 'brw_fs_reg_allocate.cpp',
> > > 'brw_fs_register_coalesce.cpp',
More information about the mesa-dev
mailing list