[Mesa-dev] [PATCH 37/59] i965/fs: generalize SIMD16 interference workaround
Connor Abbott
cwabbott0 at gmail.com
Tue May 3 19:13:26 UTC 2016
On Tue, May 3, 2016 at 2:52 PM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> On Friday, April 29, 2016 1:29:34 PM PDT Samuel Iglesias Gonsálvez wrote:
>> From: Connor Abbott <connor.w.abbott at intel.com>
>>
>> Work based on registers read/written instead of dispatch_width, so that
>> the interferences are added for 64-bit sources/destinations as well.
>> ---
>> src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 67 +++++++++++++++
> +-------
>> 1 file changed, 48 insertions(+), 19 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/
> drivers/dri/i965/brw_fs_reg_allocate.cpp
>> index 2347cd5..f0e96f9 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>> @@ -541,6 +541,34 @@ setup_mrf_hack_interference(fs_visitor *v, struct
> ra_graph *g,
>> }
>> }
>>
>> +static unsigned
>> +get_reg_node(const fs_reg& reg, unsigned first_payload_node)
>> +{
>> + switch (reg.file) {
>> + case VGRF:
>> + return reg.nr;
>> + case ARF:
>> + case FIXED_GRF:
>> + return first_payload_node + reg.nr;
>
> This can't possibly be right for ARFs. I think it's OK for FIXED_GRF.
>
> I imagine we can just drop ARF.
>
>> + case MRF:
>> + default:
>> + unreachable("unhandled register file");
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static void
>> +add_reg_interference(struct ra_graph *g, const fs_reg& reg1,
>> + const fs_reg& reg2, unsigned first_payload_node)
>> +{
>> + if ((reg1.file == VGRF || reg1.file == ARF || reg1.file == FIXED_GRF) &&
>> + (reg2.file == VGRF || reg2.file == ARF || reg2.file == FIXED_GRF)) {
>> + ra_add_node_interference(g, get_reg_node(reg1, first_payload_node),
>> + get_reg_node(reg2, first_payload_node));
>> + }
>> +}
>> +
>> bool
>> fs_visitor::assign_regs(bool allow_spilling)
>> {
>> @@ -643,26 +671,27 @@ fs_visitor::assign_regs(bool allow_spilling)
>> }
>> }
>>
>> - if (dispatch_width > 8) {
>> - /* In 16-wide dispatch we have an issue where a compressed
>> - * instruction is actually two instructions executed simultaneiously.
>> - * It's actually ok to have the source and destination registers be
>> - * the same. In this case, each instruction over-writes its own
>> - * source and there's no problem. The real problem here is if the
>> - * source and destination registers are off by one. Then you can end
>> - * up in a scenario where the first instruction over-writes the
>> - * source of the second instruction. Since the compiler doesn't know
>> - * about this level of granularity, we simply make the source and
>> - * destination interfere.
>> - */
>> - foreach_block_and_inst(block, fs_inst, inst, cfg) {
>> - if (inst->dst.file != VGRF)
>> - continue;
>> + /* When instructions both read/write more than a single SIMD8 register,
> we
>> + * have an issue where an instruction is actually two instructions
> executed
>> + * simultaneiously. It's actually ok to have the source and destination
>> + * registers be the same. In this case, each instruction over-writes
> its
>> + * own source and there's no problem. The real problem here is if the
>> + * source and destination registers are off by one. Then you can end up
> in
>> + * a scenario where the first instruction over-writes the source of the
>> + * second instruction. Since the compiler doesn't know about this level
> of
>> + * granularity, we simply make the source and destination interfere.
>> + */
>> + foreach_block_and_inst(block, fs_inst, inst, cfg) {
>> + if (inst->dst.file != VGRF &&
>> + inst->dst.file != ARF && inst->dst.file != FIXED_GRF)
>> + continue;
>>
>> - for (int i = 0; i < inst->sources; ++i) {
>> - if (inst->src[i].file == VGRF) {
>> - ra_add_node_interference(g, inst->dst.nr, inst->src[i].nr);
>> - }
>> + if (inst->regs_written <= 1)
>> + continue;
>> +
>> + for (int i = 0; i < inst->sources; ++i) {
>> + if (inst->regs_read(i) > 1) {
>> + add_reg_interference(g, inst->dst, inst->src[i],
> first_payload_node);
>> }
>> }
>> }
>>
>
> This seems wrong to me. The point of the original code is that SIMD16
> instructions internally decode as two SIMD8 instructions, which operate
> sequentially. So, the first runs, and clobbers the source for the
> second.
>
> I can't imagine that this happens for double floats - I imagine the
> hardware reads and writes entire DFs in one instruction.
>
> I'm not sure why this patch is needed at all. Can we drop it, or do
> things break?
AFAICT, that's not how things work. Essentially, the HW can't
read/write more than one SIMD8 register at a time, so even for SIMD8
it has to break double reads/writes into two. If you think about it,
this makes the restriction on horizontal strides not crossing a
register make a lot more sense -- the HW just uses the horizontal
stride inside a single read/write, and the vertical stride to figure
out how to break up an instruction into multiple instructions. To me,
this theory seemed to be confirmed by the fact that this patch did fix
things, at least when I wrote it. You could try reverting it and
seeing if it breaks things now, though.
>
> If you have virtual opcodes that are passed all the way through to the
> generator, and then become multiple instructions writing to the
> destination, then you can add those opcodes to the
> fs_inst::has_source_and_destination_hazard() method to prevent this kind
> of trouble. But I don't think you have any of those...they're all
> lowered away prior to register allocation...
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
More information about the mesa-dev
mailing list