[Mesa-dev] [PATCH 37/59] i965/fs: generalize SIMD16 interference workaround
Kenneth Graunke
kenneth at whitecape.org
Tue May 3 18:52:17 UTC 2016
On Friday, April 29, 2016 1:29:34 PM PDT Samuel Iglesias Gonsálvez wrote:
> From: Connor Abbott <connor.w.abbott at intel.com>
>
> Work based on registers read/written instead of dispatch_width, so that
> the interferences are added for 64-bit sources/destinations as well.
> ---
> src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 67 +++++++++++++++
+-------
> 1 file changed, 48 insertions(+), 19 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/
drivers/dri/i965/brw_fs_reg_allocate.cpp
> index 2347cd5..f0e96f9 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> @@ -541,6 +541,34 @@ setup_mrf_hack_interference(fs_visitor *v, struct
ra_graph *g,
> }
> }
>
> +static unsigned
> +get_reg_node(const fs_reg& reg, unsigned first_payload_node)
> +{
> + switch (reg.file) {
> + case VGRF:
> + return reg.nr;
> + case ARF:
> + case FIXED_GRF:
> + return first_payload_node + reg.nr;
This can't possibly be right for ARFs. I think it's OK for FIXED_GRF.
I imagine we can just drop ARF.
> + case MRF:
> + default:
> + unreachable("unhandled register file");
> + }
> +
> + return 0;
> +}
> +
> +static void
> +add_reg_interference(struct ra_graph *g, const fs_reg& reg1,
> + const fs_reg& reg2, unsigned first_payload_node)
> +{
> + if ((reg1.file == VGRF || reg1.file == ARF || reg1.file == FIXED_GRF) &&
> + (reg2.file == VGRF || reg2.file == ARF || reg2.file == FIXED_GRF)) {
> + ra_add_node_interference(g, get_reg_node(reg1, first_payload_node),
> + get_reg_node(reg2, first_payload_node));
> + }
> +}
> +
> bool
> fs_visitor::assign_regs(bool allow_spilling)
> {
> @@ -643,26 +671,27 @@ fs_visitor::assign_regs(bool allow_spilling)
> }
> }
>
> - if (dispatch_width > 8) {
> - /* In 16-wide dispatch we have an issue where a compressed
> - * instruction is actually two instructions executed simultaneiously.
> - * It's actually ok to have the source and destination registers be
> - * the same. In this case, each instruction over-writes its own
> - * source and there's no problem. The real problem here is if the
> - * source and destination registers are off by one. Then you can end
> - * up in a scenario where the first instruction over-writes the
> - * source of the second instruction. Since the compiler doesn't know
> - * about this level of granularity, we simply make the source and
> - * destination interfere.
> - */
> - foreach_block_and_inst(block, fs_inst, inst, cfg) {
> - if (inst->dst.file != VGRF)
> - continue;
> + /* When instructions both read/write more than a single SIMD8 register,
we
> + * have an issue where an instruction is actually two instructions
executed
> + * simultaneiously. It's actually ok to have the source and destination
> + * registers be the same. In this case, each instruction over-writes
its
> + * own source and there's no problem. The real problem here is if the
> + * source and destination registers are off by one. Then you can end up
in
> + * a scenario where the first instruction over-writes the source of the
> + * second instruction. Since the compiler doesn't know about this level
of
> + * granularity, we simply make the source and destination interfere.
> + */
> + foreach_block_and_inst(block, fs_inst, inst, cfg) {
> + if (inst->dst.file != VGRF &&
> + inst->dst.file != ARF && inst->dst.file != FIXED_GRF)
> + continue;
>
> - for (int i = 0; i < inst->sources; ++i) {
> - if (inst->src[i].file == VGRF) {
> - ra_add_node_interference(g, inst->dst.nr, inst->src[i].nr);
> - }
> + if (inst->regs_written <= 1)
> + continue;
> +
> + for (int i = 0; i < inst->sources; ++i) {
> + if (inst->regs_read(i) > 1) {
> + add_reg_interference(g, inst->dst, inst->src[i],
first_payload_node);
> }
> }
> }
>
This seems wrong to me. The point of the original code is that SIMD16
instructions internally decode as two SIMD8 instructions, which operate
sequentially. So, the first runs, and clobbers the source for the
second.
I can't imagine that this happens for double floats - I imagine the
hardware reads and writes entire DFs in one instruction.
I'm not sure why this patch is needed at all. Can we drop it, or do
things break?
If you have virtual opcodes that are passed all the way through to the
generator, and then become multiple instructions writing to the
destination, then you can add those opcodes to the
fs_inst::has_source_and_destination_hazard() method to prevent this kind
of trouble. But I don't think you have any of those...they're all
lowered away prior to register allocation...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160503/77477d41/attachment-0001.sig>
More information about the mesa-dev
mailing list