[Mesa-dev] [PATCH 2/9] i965/fs: Register allocator shoudn't use grf127 for sends dest
Chema Casanova
jmcasanova at igalia.com
Wed Jul 11 13:50:02 UTC 2018
Including mesa-dev in my previous reply.
El 11/07/18 a las 01:08, Caio Marcelo de Oliveira Filho escribió:
>> Since Gen8+ Intel PRM states that "r127 must not be used for return
>> address when there is a src and dest overlap in send instruction."
>
> The previous patch, that verifies the condition above is causing
>
> tests/spec/arb_compute_shader/linker/bug-93840.shader_test
>
> to crash with
>
> shader_runner: ../src/intel/compiler/brw_fs_generator.cpp:2455: int fs_generator::generate_code(const cfg_t*, int): Assertion `validated' failed.
>
> I also could reproduce the crash locally. It happens even with this
> patch (which adds the hack) applied.
I've seen it in Jenkins, but couldn't reproduce it so I thought it
wasn't related. Now I've realized that I was using a release build at
that moment.
The good thing is that the validator rule has detected that the
generated instruction was incorrect.
>> This patch implements this restriction creating new grf127_send_hack_node
>> at the register allocator. This node has a fixed assignation to grf127.
>>
>> For vgrf that are used as destination of send messages we create node
>> interfereces with the grf127_send_hack_node. So the register allocator
>> will never assign to these vgrf a register that involves grf127.
>>
>> If dispatch_width > 8 we don't create these interferences to the because
>> all instructions have node interferences between sources and destination.
>> That is enough to avoid the r127 restriction.
>
> I think for both widths will not be enough. The instruction that fails
> the validation is:
>
> mov(8) g126<1>UD g0<8,8,1>UD { align1 WE_all 1Q };
> mov(1) g126.2<1>UD 0x00000090UD { align1 WE_all 1N };
> send(16) g126<1>UW g126<8,8,1>UD
> data ( DC OWORD block read, 253, 3) mlen 1 rlen 2 { align1 WE_all 1H };
> ERROR: r127 must not be used for return address when there is a src and dest overlap
>
> Which if I understood correctly comes from the scratch reading being
> created by the spilling logic. In brw_oword_block_read_scratch() we
> see
>
> if (p->devinfo->gen >= 7) {
> /* On gen 7 and above, we no longer have message registers and we can
> * send from any register we want. By using the destination register
> * for the message, we guarantee that the implied message write won't
> * accidentally overwrite anything. This has been a problem because
> * the MRF registers and source for the final FB write are both fixed
> * and may overlap.
> */
> mrf = retype(dest, BRW_REGISTER_TYPE_UD);
> } else {
> mrf = retype(mrf, BRW_REGISTER_TYPE_UD);
> }
> dest = retype(dest, BRW_REGISTER_TYPE_UW);
>
> It seems to me we'll have to handle r127 there as well.
Yes, as in this case source and destination are coded to be the same
vgrf, we don't have a source/destination interference on SIMD16.
I'm doing some extra testing but something like next code at
assigns_regs seems to fix the issue:
if (spilled_any_registers) {
foreach_block_and_inst(block, fs_inst, inst, cfg) {
if (inst->opcode == SHADER_OPCODE_GEN7_SCRATCH_READ ||
inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ) {
ra_add_node_interference(g, inst->dst.nr,
grf127_send_hack_node);
}
}
Thanks Caio for digging into the problem. I'm sending today a patch to
deal with this case.
Chema
>>
>> This fixes CTS tests that raised this issue as they were executed as SIMD8:
>>
>> dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom
>>
>> Shader-db results on Skylake:
>> total instructions in shared programs: 7686798 -> 7686797 (<.01%)
>> instructions in affected programs: 301 -> 300 (-0.33%)
>> helped: 1
>> HURT: 0
>>
>> total cycles in shared programs: 337092322 -> 337091919 (<.01%)
>> cycles in affected programs: 22420415 -> 22420012 (<.01%)
>> helped: 712
>> HURT: 588
>>
>> Shader-db results on Broadwell:
>>
>> total instructions in shared programs: 7658574 -> 7658625 (<.01%)
>> instructions in affected programs: 19610 -> 19661 (0.26%)
>> helped: 3
>> HURT: 4
>>
>> total cycles in shared programs: 340694553 -> 340676378 (<.01%)
>> cycles in affected programs: 24724915 -> 24706740 (-0.07%)
>> helped: 998
>> HURT: 916
>>
>> total spills in shared programs: 4300 -> 4311 (0.26%)
>> spills in affected programs: 333 -> 344 (3.30%)
>> helped: 1
>> HURT: 3
>>
>> total fills in shared programs: 5370 -> 5378 (0.15%)
>> fills in affected programs: 274 -> 282 (2.92%)
>> helped: 1
>> HURT: 3
>>
>> v2: Avoid duplicating register classes without grf127. Let's use a node
>> with a fixed assignation to grf127 and create interferences to send
>> message vgrf destinations. (Eric Anholt)
>> v3: Update reference to CTS VK_KHR_8bit_storage failing tests.
>> (Jose Maria Casanova)
>> ---
>> src/intel/compiler/brw_fs_reg_allocate.cpp | 25 ++++++++++++++++++++++
>> 1 file changed, 25 insertions(+)
>>
>> diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp b/src/intel/compiler/brw_fs_reg_allocate.cpp
>> index ec8e116cb38..59e047483c0 100644
>> --- a/src/intel/compiler/brw_fs_reg_allocate.cpp
>> +++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
>> @@ -548,6 +548,9 @@ fs_visitor::assign_regs(bool allow_spilling, bool spill_all)
>> int first_mrf_hack_node = node_count;
>> if (devinfo->gen >= 7)
>> node_count += BRW_MAX_GRF - GEN7_MRF_HACK_START;
>> + int grf127_send_hack_node = node_count;
>> + if (devinfo->gen >= 8 && dispatch_width == 8)
>> + node_count ++;
>> struct ra_graph *g =
>> ra_alloc_interference_graph(compiler->fs_reg_sets[rsi].regs, node_count);
>>
>> @@ -653,6 +656,28 @@ fs_visitor::assign_regs(bool allow_spilling, bool spill_all)
>> }
>> }
>>
>> + if (devinfo->gen >= 8 && dispatch_width == 8) {
>> + /* At Intel Broadwell PRM, vol 07, section "Instruction Set Reference",
>> + * subsection "EUISA Instructions", Send Message (page 990):
>> + *
>> + * "r127 must not be used for return address when there is a src and
>> + * dest overlap in send instruction."
>> + *
>> + * We are avoiding using grf127 as part of the destination of send
>> + * messages adding a node interference to the grf127_send_hack_node.
>> + * This node has a fixed asignment to grf127.
>> + *
>> + * We don't apply it to SIMD16 because previous code avoids any register
>> + * overlap between sources and destination.
>> + */
>> + ra_set_node_reg(g, grf127_send_hack_node, 127);
>> + foreach_block_and_inst(block, fs_inst, inst, cfg) {
>> + if (inst->is_send_from_grf() && inst->dst.file == VGRF) {
>> + ra_add_node_interference(g, inst->dst.nr, grf127_send_hack_node);
>> + }
>> + }
>> + }
>> +
>> /* Debug of register spilling: Go spill everything. */
>> if (unlikely(spill_all)) {
>> int reg = choose_spill_reg(g);
>> --
>> 2.17.1
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
More information about the mesa-dev
mailing list