[Mesa-dev] [PATCH v2 1/3] i965/fs: fix indirect load DF uniforms on BSW/BXT
Francisco Jerez
currojerez at riseup.net
Mon Feb 20 20:31:18 UTC 2017
Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> On Mon, 2017-02-20 at 08:58 +0100, Samuel Iglesias Gonsálvez wrote:
>> On Sat, 2017-02-18 at 18:58 -0800, Francisco Jerez wrote:
>> > Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>> >
>> > > The lowered BSW/BXT indirect move instructions had incorrect
>> > > source types, which luckily wasn't causing incorrect assembly to
>> > > be
>> > > generated due to the bug fixed in the next patch, but would have
>> > > confused the remaining back-end IR infrastructure due to the
>> > > mismatch
>> > > between the IR source types and the emitted machine code.
>> > >
>> > > v2:
>> > > - Improve commit log (Curro)
>> > > - Fix read_size (Curro)
>> > > - Fix DF uniform array detection in assign_constant_locations()
>> > > when
>> > > it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT.
>> > >
>> > > Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
>> > > Cc: "17.0" <mesa-stable at lists.freedesktop.org>
>> > > ---
>> > > src/mesa/drivers/dri/i965/brw_fs.cpp | 11 ++++++++-
>> > > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 41 ++++++++++++++++--
>> > > --------------
>> > > 2 files changed, 30 insertions(+), 22 deletions(-)
>> > >
>> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > > b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > > index c348bc7138d..93ab84b5845 100644
>> > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > > @@ -1944,10 +1944,19 @@ fs_visitor::assign_constant_locations()
>> > > unsigned last = constant_nr + (inst->src[2].ud / 4)
>> > > -
>> > > 1;
>> > > assert(last < uniforms);
>> > >
>> > > + bool supports_64bit_indirects =
>> > > + !devinfo->is_cherryview && !devinfo->is_broxton;
>> > > + /* Detect if this is as a result 64-bit MOV
>> > > INDIRECT.
>> > > In case
>> > > + * of BSW/BXT, we substitute each DF MOV INDIRECT by
>> > > two 32-bit MOV
>> > > + * INDIRECT.
>> > > + */
>> > > + bool mov_indirect_64bit = (type_sz(inst-
>> > > >src[i].type)
>> > > == 8) ||
>> > > + (!supports_64bit_indirects && inst->dst.type ==
>> > > BRW_REGISTER_TYPE_UD &&
>> > > + inst->src[0].type == BRW_REGISTER_TYPE_UD &&
>> > > inst-
>> > > > dst.stride == 2);
>> >
>> > This seems kind of fragile, I don't think the optimizer gives you
>> > any
>> > guarantees that the stride of a lowered 64-bit indirect move will
>> > remain
>> > equal to two, or that the destination stride of an actual 32-bit
>> > indirect uniform load will never end up being two as well. That
>> > said,
>> > because you access these with 32-bit indirect moves, I don't see
>> > why
>> > you'd need to treat them as 64-bit uniforms here, the usual
>> > alignment
>> > requirements for 64-bit uniforms no longer apply, so you can treat
>> > them
>> > as regular 32-bit uniforms AFAICT. Why did you add this hunk?
>> >
>>
>> I added it because of this case: if we access to one DF uniform array
>> element with a normal MOV and the rest with MOV INDIRECT, we will
>> mark
>> the former as a live 64bit variable. Then we have the array scattered
>> as part of it is uploaded as a 64-bits uniform and the other as 32-
>> bits. Even if we avoid this by uploading everything together as 32-
>> bits, then the access to that DF could not be aligned to 64-bits.
>>
>> So my idea was to find a way to identify somehow those MOV INDIRECT
>> in
>> BSW to mark all the array as a 64-bit one.
>>
>
> Mmm, maybe I can fix this case without the hack I did. I can add the
> following code after marking all live variables accessed by the
> instructions.
>
> It is very similar to the one to detect live variables but it is fixing
> the case where any MOV INDIRECT in BSW is accessing to an uniform array
> of DF elements where one of these elements is directly accessed by
> another instruction.
>
> What do you think?
>
Looks somewhat better, but I don't think this is correct if you have
multiple overlapping indirect loads of the same uniform array and only
one of them overlaps with a direct 64-bit load. Apparently
assign_constant_locations() already has code to attempt to push
indirectly accessed uniform sections contiguously, but the logic seems
broken in multiple ways, even on platforms other than CHV/BXT... The
first is_live_64bit location assignment loop doesn't consider non-64bit
uniforms even if they're marked as contiguous with a 64-bit uniform,
because the contiguous block tracking is done from within
set_push_pull_constant_loc() which may not be called at all if the
continue condition evaluates to true at the top of the loop. Also
AFAICT this could potentially lead to a very large amount of data being
considered as contiguous if you have a contiguous 64-bit block, then
random unrelated 32-bit data, and then another contiguous 64-bit block,
because the very last element of a MOV_INDIRECT block doesn't seem to be
marked is_live_64bit, which is also the one that would terminate the
64-bit contiguous block and actually cause it to be assigned a location.
I don't think the current code would even guarantee proper alignment for
subsequent 64-blocks in that case because it may be pushing unrelated
32-bit data in between. Worse, if the UNIFORM space ends with a 64-bit
contiguous uniform block it may be considered contiguous with whatever
unrelated data is handled first by the next 32-bit uniform assignment
pass.
Sigh... Seems like this was only scratching the surface...
> Sam
>
> + /* Detect if we accessed to an array of DF uniforms in both direct
> and
> + * indirect ways on BSW/BXT. If this is the case, then we need
> + * to take into account in order to push all the elements together.
> + */
> + if (devinfo->is_cherryview || devinfo->is_broxton) {
> + foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
> + for (int i = 0 ; i < inst->sources; i++) {
> + if (inst->src[i].file != UNIFORM)
> + continue;
> +
> + int constant_nr = inst->src[i].nr + inst->src[i].offset /
> 4;
> +
> + if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0)
> {
> + assert(inst->src[2].ud % 4 == 0);
> + unsigned last = constant_nr + (inst->src[2].ud / 4) -
> 1;
> + assert(last < uniforms);
> +
> + bool is_df_array = false;
> + for (unsigned j = constant_nr; j < last; j++) {
> + if (is_live_64bit[j]) {
> + is_df_array = true;
> + break;
> + }
> + }
> +
> + if (is_df_array) {
> + /* If part of it is accessed with direct addressing
> as a DF
> + * then we mark the whole array as DF.
> + */
> + for (unsigned j = constant_nr; j < last; j++)
> + is_live_64bit[j] = true;
> + }
> + }
> + }
> + }
> + }
> +
>
>> > Other than that patch looks good to me.
>> >
>>
>> OK, thanks.
>>
>> Sam
>>
>> > > for (unsigned j = constant_nr; j < last; j++) {
>> > > is_live[j] = true;
>> > > contiguous[j] = true;
>> > > - if (type_sz(inst->src[i].type) == 8) {
>> > > + if (mov_indirect_64bit) {
>> > > is_live_64bit[j] = true;
>> > > }
>> > > }
>> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > > index 991c20fad62..4aaf84964e9 100644
>> > > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > > @@ -3878,31 +3878,30 @@ fs_visitor::nir_emit_intrinsic(const
>> > > fs_builder &bld, nir_intrinsic_instr *instr
>> > > unsigned read_size = instr->const_index[1] -
>> > > (instr->num_components - 1) * type_sz(dest.type);
>> > >
>> > > - fs_reg indirect_chv_high_32bit;
>> > > - bool is_chv_bxt_64bit =
>> > > - (devinfo->is_cherryview || devinfo->is_broxton) &&
>> > > - type_sz(dest.type) == 8;
>> > > - if (is_chv_bxt_64bit) {
>> > > - indirect_chv_high_32bit =
>> > > vgrf(glsl_type::uint_type);
>> > > - /* Calculate indirect address to read high 32 bits
>> > > */
>> > > - bld.ADD(indirect_chv_high_32bit, indirect,
>> > > brw_imm_ud(4));
>> > > - }
>> > > + bool supports_64bit_indirects =
>> > > + !devinfo->is_cherryview && !devinfo->is_broxton;
>> > >
>> > > - for (unsigned j = 0; j < instr->num_components; j++) {
>> > > - if (!is_chv_bxt_64bit) {
>> > > + if (type_sz(dest.type) != 8 ||
>> > > supports_64bit_indirects)
>> > > {
>> > > + for (unsigned j = 0; j < instr->num_components; j++)
>> > > {
>> > > bld.emit(SHADER_OPCODE_MOV_INDIRECT,
>> > > offset(dest, bld, j), offset(src, bld,
>> > > j),
>> > > indirect, brw_imm_ud(read_size));
>> > > - } else {
>> > > - bld.emit(SHADER_OPCODE_MOV_INDIRECT,
>> > > - subscript(offset(dest, bld, j),
>> > > BRW_REGISTER_TYPE_UD, 0),
>> > > - offset(src, bld, j),
>> > > - indirect, brw_imm_ud(read_size));
>> > > -
>> > > - bld.emit(SHADER_OPCODE_MOV_INDIRECT,
>> > > - subscript(offset(dest, bld, j),
>> > > BRW_REGISTER_TYPE_UD, 1),
>> > > - offset(src, bld, j),
>> > > - indirect_chv_high_32bit,
>> > > brw_imm_ud(read_size));
>> > > + }
>> > > + } else {
>> > > + const unsigned num_mov_indirects =
>> > > + type_sz(dest.type) /
>> > > type_sz(BRW_REGISTER_TYPE_UD);
>> > > + /* We read a little bit less per MOV INDIRECT, as
>> > > they
>> > > are now
>> > > + * 32-bits ones instead of 64-bit. Fix read_size
>> > > then.
>> > > + */
>> > > + const unsigned read_size_32bit = read_size -
>> > > + (num_mov_indirects - 1) *
>> > > type_sz(BRW_REGISTER_TYPE_UD);
>> > > + for (unsigned j = 0; j < instr->num_components; j++)
>> > > {
>> > > + for (unsigned i = 0; i < num_mov_indirects; i++)
>> > > {
>> > > + bld.emit(SHADER_OPCODE_MOV_INDIRECT,
>> > > + subscript(offset(dest, bld, j),
>> > > BRW_REGISTER_TYPE_UD, i),
>> > > + subscript(offset(src, bld, j),
>> > > BRW_REGISTER_TYPE_UD, i),
>> > > + indirect,
>> > > brw_imm_ud(read_size_32bit));
>> > > + }
>> > > }
>> > > }
>> > > }
>> > > --
>> > > 2.11.0
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170220/fc10d1fd/attachment-0001.sig>
More information about the mesa-dev
mailing list