[Mesa-dev] [PATCH 1/2] i965/vec4: set swizzle when loading an uniform

Tue Apr 25 23:22:01 UTC 2017

Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:

> On Mon, 2017-04-24 at 11:22 -0700, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>> 
>> > On Fri, 2017-04-21 at 10:23 -0700, Francisco Jerez wrote:
>> > > Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>> > > 
>> > > > On Thu, 2017-04-20 at 10:26 -0700, Francisco Jerez wrote:
>> > > > > Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>> > > > > 
>> > > > > > It was setting XYWZ swizzle to all uniforms, no matter if
>> > > > > > they
>> > > > > > were
>> > > > > > a vector or not.
>> > > > > > 
>> > > > > > Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.
>> > > > > > com>
>> > > > > > Cc: currojerez at riseup.net
>> > > > > 
>> > > > > Don't you need to CC mesa-stable here and in the next patch?
>> > > > > 
>> > > > 
>> > > > I considered it but I has doubts about which tag use "17.1.0-
>> > > > rc1"
>> > > > or
>> > > > just "17.1.0" or whatever. So my plan is to notify Emil once
>> > > > they
>> > > > are
>> > > > merged (and add Cc to stable in the commit log before pushing
>> > > > it to
>> > > > master).
>> > > > 
>> > > > If you are more comfortable with Cc mesa-stable, I will do it
>> > > > next
>> > > > time
>> > > > (or if I need to send v2 of this series).
>> > > > 
>> > > 
>> > > I believe Emil will notice them even if you don't put a version
>> > > tag.  I
>> > > don't really care how you nominate it for stable as long as it
>> > > hits
>> > > the
>> > > 17.1 branch before the end of the release cycle.  ;)
>> > > 
>> > > > > > ---
>> > > > > >  src/intel/compiler/brw_vec4_nir.cpp | 1 +
>> > > > > >  1 file changed, 1 insertion(+)
>> > > > > > 
>> > > > > > diff --git a/src/intel/compiler/brw_vec4_nir.cpp
>> > > > > > b/src/intel/compiler/brw_vec4_nir.cpp
>> > > > > > index a82d52088a8..5f4488c7e86 100644
>> > > > > > --- a/src/intel/compiler/brw_vec4_nir.cpp
>> > > > > > +++ b/src/intel/compiler/brw_vec4_nir.cpp
>> > > > > > @@ -863,6 +863,7 @@
>> > > > > > vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr
>> > > > > > *instr)
>> > > > > >           unsigned offset = const_offset->u32[0] + shift *
>> > > > > > 4;
>> > > > > >           src.offset = ROUND_DOWN_TO(offset, 16);
>> > > > > >           shift = (offset % 16) / 4;
>> > > > > > +         src.swizzle = brw_swizzle_for_size(instr-
>> > > > > > > num_components);
>> > > > > 
>> > > > > What about the indirect case a few lines below?  Isn't the
>> > > > > swizzle
>> > > > > passed
>> > > > > to the mov indirect instruction still bogus?
>> > > > > 
>> > > > 
>> > > > This is different. It is expecting to have a swizzle of XYZW
>> > > > because
>> > > > MOV_INDIRECT will copy all the contents. See assert in
>> > > > move_uniform_array_access_to_pull_constants()
>> > > 
>> > > I believe the ultimate problem here is that the MOV_INDIRECT gets
>> > > a
>> > > writemask of XYZW even if you're reading a scalar, so setting the
>> > > minimal swizzle will lead to a situation in which the resulting
>> > > swizzle
>> > > is not the identity so you will run into trouble to turn it into
>> > > scratch
>> > > access.  That brings me to the following question which I don't
>> > > think
>> > > I
>> > > can answer by looking at this patch alone: If having an XYZW
>> > > swizzle
>> > > is
>> > > a problem for direct moves (I assume this patch is fixing
>> > > something?),
>> > 
>> > The bug is to properly identify DFs and dvecs, instead of
>> > considering
>> > all of them as dvec4 when aligning them for the push constant
>> > buffer,
>> > which is done in next patch.
>> > 
>> 
>> Sorry, I'm not following what you mean with the last paragraph.  What
>> does this have to do with identifying DFs?
>> 
>
> By the register type, I know it is a DF, but I need then to know how
> many components are used in order to know if they are dvec3 or dvec4,
> so I push them first (so they are aligned to dvec4), or just dvec2 or a
> double scalar, so I push them later. This is whay I meant when I said
> identifying DF, sorry for the misunderstanding.
>

Sounds like the problem is not specific to DFs in that case?

>> > > how come it is not a problem for indirect moves?
>> > > 
>> > 
>> > It is not a problem because the uniforms under indirect moves are
>> > moved
>> > to pull constant buffer in
>> > move_uniform_array_access_to_pull_constants().
>> > 
>> 
>> Is that really always the case?
>
> For GL, yes.
>
>>   Even on Vulkan?
>> 
>
> Good question! Looking at the code, Vulkan driver doesn't support pull
> constants other than UBOs so everything has to be pushed regardless.
>
> In the push constant buffer code, it marks every register touched by
> MOV INDIRECT as fully used, ignoring its swizzle because it is just
> interested on the number of read vec4 slots and consider we fully read
> them, which is also mentioned in the quote I pasted in a previous
> email:
>
> /* We just mark every register touched by a MOV_INDIRECT as being
>  * fully used.  This ensures that it doesn't broken up piecewise by
>  * the next part of our packing algorithm.
>  */
>
> When the swizzle is used then? The swizzle is taking into account in
> the generator to add it as an offset when the indirect offset is not an
> immediate, or add it to the register swizzle if the indirect offset is
> an immediate. As we considered the uniform arrays as fully accessed
> (swizzle and writemasks are XYZW), we use the number of read registers
> and they are aligned to vec4 (see below), I think there is no problem
> with them and we don't need to update the swizzles.
>
>> > Those DF pull constants are read with 2 messages, so they don't
>> > need to
>> > be dvec4-aligned like in the case of direct moves on the push
>> > constant
>> > buffer, and the swizzle is actually ignored when loading it to pull
>> > constant buffer.
>> > 
>> 
>> Is the swizzle actually meant to be ignored or is that a bug of the
>> move_uniform_array_access_to_pull_constants() pass?  If it is meant
>> to
>> be ignored why do we set a non-identity swizzle below for shift != 0
>> in
>> the indirect path a couple of lines below?
>> 
>
> 'shift' indicates if the uniform is aligned to vec4 (shift == 0) and,
> if not, which is the offset in components of 32-bits, so we can add it
> to the swizzle. However, uniforms follow std140 rule (as they are part
> of the default uniform block), which says the following for arrays.
> From OpenGL 4.5 spec:
>
> "4. If the member is an array of scalars or vectors, the base alignment
> and array stride are set to match the base alignment of a single array
> element, according to rules (1), (2), and (3), and rounded up to the
> base alignment of a vec4."
>
> So an uniform array is aligned to vec4, then shift == 0 and the swizzle
> is not changed by 'shift' in this case. I.e. shift doesn't change
> anything.
>
> I guess 'shift' is used, together with the assert in
> move_uniform_array_access_to_pull_constants(), to detect uniform arrays
> not aligned to vec4 size. This check can be replaced with assert(shift
> == 0) here.
>

Please do, this path is almost guaranteed not to do what you'd expect
unless shift == 0.

>> > However, if you prefer to keep consistency between both cases, I
>> > can
>> > apply the shuffle globally and remove the NOOP assert in
>> > move_uniform_array_access_to_pull_constants(). This change doesn't
>> > add
>> > any regression on piglit.
>> > 
>> > What do you think?
>> > 
>> 
>> I'm not really that worried about consistency, just trying to
>> understand
>> what exactly is going on in this patch in order to convince myself
>> that
>> this is a complete fix.
>> 
>
> The idea with this fix is to help next patch with does the alignment
> for dvec4s. Without this fix, I see an XYZW swizzle in all uniforms, so
> I don't know who is a genuinely dvec4 and who is not. On top of that,
> without this patch, pack_uniform_registers() would be wrong for DF
> scalar and dvec2 as it would mark 2 vec4 as used, when it's actually
> one, because it sees swizzle and writemask of XYWZ for both.
>
> While writing this, I found an error in this patch. I focused on
> setting the swizzle but actually we need to set both the swizzle and
> the writemask so we don't iterate more than needed when setting the
> used channels in pack_uniform_registers(). However I don't think we
> need to change anything for the indirect case, as in MOV INDIRECT the
> writemask is explicitly marked as XYZW because we write the whole
> thing... then we don't need to change anything (nor swizzle nor
> writemask).
>
> What do you think?
>

Yeah, that definitely smells fishy, would you mind fixing the writemask
in addition?

Another thing I noticed while digging into this is that the 'src.swizzle
+= BRW_SWIZZLE4(shift, shift, shift, shift)' line on both the direct and
indirect paths will lead to an overflow and corruption of some of your
swizzle components unless the original swizzle is minimal or shift == 0.
That means that we probably need to apply this change even if fixing the
writemask alone would avoid the bug you noticed.  We probably want to
fix both.

On top of that the swizzle calculation doesn't even look correct for DF
in the first place...  It's based on 'shift' which is in dword units
(e.g. 'shift = (offset % 16) / 4'), and the nir_intrinsic_load_uniform
implementation uses it as offset for the source register swizzle which
is in component units...  Makes me feel even more nervous about any of
the vec4 back-end code being used in production... :P

> I am going to do more tests and try some ideas I have now. I will
> contact you afterwards. I think we should discuss this privately (email
> or VoIP/IRC) to avoid adding more noise to the mailing list.
>

Sure, feel free to ping me via IRC or e-mail whenever you have something
to talk about.

> Sam
>
>> > > > and the comment in pack_uniform_registers():
>> > > > 
>> > > > /* We just mark every register touched by a MOV_INDIRECT as
>> > > > being
>> > > >  * fully used.  This ensures that it doesn't broken up
>> > > > piecewise by
>> > > >  * the next part of our packing algorithm.
>> > > >  */
>> > > > 
>> > > 
>> > > Not sure why this is relevant to the discussion?  The same will
>> > > be
>> > > the
>> > > case if you set the same swizzle as for the direct move.
>> > > 
>> > 
>> > Right, I was wrong. Sorry for the noise,
>> > 
>> > Sam
>> > 
>> > > > Sam
>> > > > 
>> > > > > >           src.swizzle += BRW_SWIZZLE4(shift, shift, shift,
>> > > > > > shift);
>> > > > > >  
>> > > > > >           emit(MOV(dest, src));
>> > > > > > -- 
>> > > > > > 2.11.0
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170425/0f652275/attachment.sig>