[Mesa-dev] [PATCH v3 25/42] intel/compiler: workaround for SIMD8 half-float MAD in gen8

Jason Ekstrand jason at jlekstrand.net
Fri Jan 18 17:51:18 UTC 2019


On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga <itoral at igalia.com>
wrote:

> Broadwell hardware has a bug that manifests in SIMD8 executions of
> 16-bit MAD instructions when any of the sources is a Y or W component.
> We pack these components in the same SIMD register as components X and
> Z respectively, but starting at offset 16B (so they live in the second
> half of the register). The problem does not exist in SKL or later.
>
> We work around this issue by moving any such sources to a temporary
> starting at offset 0B. We want to do this after the main optimization loop
> to prevent copy-propagation and friends to undo the fix.
>
> Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
> ---
>  src/intel/compiler/brw_fs.cpp | 48 +++++++++++++++++++++++++++++++++++
>  src/intel/compiler/brw_fs.h   |  1 +
>  2 files changed, 49 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 0b3ec94e2d2..d6096cd667d 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -6540,6 +6540,48 @@ fs_visitor::optimize()
>     validate();
>  }
>
> +/**
> + * Broadwell hardware has a bug that manifests in SIMD8 executions of
> 16-bit
> + * MAD instructions when any of the sources is a Y or W component. We pack
> + * these components in the same SIMD register as components X and Z
> + * respectively, but starting at offset 16B (so they live in the second
> half
> + * of the register).
>

What exactly do you mean by a Y or W component?  Is this for the case where
you have a scalar that happens to land at certain offsets?  Or does it
apply to regular stride == 1 MADs?  If it applied in the stride == 1 case,
then I really don't see what this is doing to fix it.  It might help if you
provided some before and after assembly example.

Also, this seems like something that should go in the new region
restrictions pass as a special case in has_invalid_src_region.

--Jason


> + *
> + * We work around this issue by moving any such sources to a temporary
> + * starting at offset 0B. We want to do this after the main optimization
> loop
> + * to prevent copy-propagation and friends to undo the fix.
> + */
> +void
> +fs_visitor::fixup_hf_mad()
> +{
> +   if (devinfo->gen != 8)
> +      return;
> +
> +   bool progress = false;
> +
> +   foreach_block_and_inst_safe (block, fs_inst, inst, cfg) {
> +      if (inst->opcode != BRW_OPCODE_MAD ||
> +          inst->dst.type != BRW_REGISTER_TYPE_HF ||
> +          inst->exec_size > 8)
> +         continue;
> +
> +      for (int i = 0; i < 3; i++) {
> +         if (inst->src[i].offset > 0) {
> +            assert(inst->src[i].type == BRW_REGISTER_TYPE_HF);
> +            const fs_builder ibld =
> +               bld.at(block, inst).exec_all().group(inst->exec_size, 0);
> +            fs_reg tmp = ibld.vgrf(inst->src[i].type);
> +            ibld.MOV(tmp, inst->src[i]);
> +            inst->src[i] = tmp;
> +            progress = true;
> +         }
> +      }
> +   }
> +
> +   if (progress)
> +      invalidate_live_intervals();
> +}
> +
>  /**
>   * Three source instruction must have a GRF/MRF destination register.
>   * ARF NULL is not allowed.  Fix that up by allocating a temporary GRF.
> @@ -6698,6 +6740,7 @@ fs_visitor::run_vs()
>     assign_curb_setup();
>     assign_vs_urb_setup();
>
> +   fixup_hf_mad();
>     fixup_3src_null_dest();
>     allocate_registers(8, true);
>
> @@ -6782,6 +6825,7 @@ fs_visitor::run_tcs_single_patch()
>     assign_curb_setup();
>     assign_tcs_single_patch_urb_setup();
>
> +   fixup_hf_mad();
>     fixup_3src_null_dest();
>     allocate_registers(8, true);
>
> @@ -6816,6 +6860,7 @@ fs_visitor::run_tes()
>     assign_curb_setup();
>     assign_tes_urb_setup();
>
> +   fixup_hf_mad();
>     fixup_3src_null_dest();
>     allocate_registers(8, true);
>
> @@ -6865,6 +6910,7 @@ fs_visitor::run_gs()
>     assign_curb_setup();
>     assign_gs_urb_setup();
>
> +   fixup_hf_mad();
>     fixup_3src_null_dest();
>     allocate_registers(8, true);
>
> @@ -6965,6 +7011,7 @@ fs_visitor::run_fs(bool allow_spilling, bool
> do_rep_send)
>
>        assign_urb_setup();
>
> +      fixup_hf_mad();
>        fixup_3src_null_dest();
>        allocate_registers(8, allow_spilling);
>
> @@ -7009,6 +7056,7 @@ fs_visitor::run_cs(unsigned min_dispatch_width)
>
>     assign_curb_setup();
>
> +   fixup_hf_mad();
>     fixup_3src_null_dest();
>     allocate_registers(min_dispatch_width, true);
>
> diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
> index 68287bcdcea..1879d4bc7f7 100644
> --- a/src/intel/compiler/brw_fs.h
> +++ b/src/intel/compiler/brw_fs.h
> @@ -103,6 +103,7 @@ public:
>     void setup_vs_payload();
>     void setup_gs_payload();
>     void setup_cs_payload();
> +   void fixup_hf_mad();
>     void fixup_3src_null_dest();
>     void assign_curb_setup();
>     void calculate_urb_setup();
> --
> 2.17.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20190118/f86d5f2d/attachment.html>


More information about the mesa-dev mailing list