[Mesa-dev] [PATCH 39/78] i965/nir/vec4: Add swizzle utility method for vector ops

Fri Jul 3 10:23:59 PDT 2015

Hi Jason,

On mar, 2015-06-30 at 14:18 -0700, Jason Ekstrand wrote:
> On Fri, Jun 26, 2015 at 1:06 AM, Eduardo Lima Mitev <elima at igalia.com> wrote:
> > From: Antia Puentes <apuentes at igalia.com>
> >
> > For operations that have a predefined operand size > 0, defined in
> > glsl/nir/nir_opcodes.c, NIR returns a swizzle containing zeros in the
> > components from outside the source vector. However, the driver
> > expects those components to have a swizzle value equal to the swizzle
> > of the component in (number of vector elements - 1). For example,
> >
> >    for a vec2 operation with an identity swizzle (.xy):
> >       - (NIR -> XYXX, driver ->XYYYY)
> >
> >    for a vec3 operation with swizzle (.zxy)
> >       - (NIR-> ZXYX, driver -> ZXYY)
> 
> Why is this needed.  Was there some bug you ran into regarding this or
> are you just trying to make the generated code match?  If the
> component isn't used, then it seems to me like the swizzle shouldn't
> matter.  NIR may give you bogus swizzles outside of the enabled
> channels but it may do that regardless.  I'm also not seeing how
> composing swizzles fixes anything.

There were several bugs in our NIR->vec4 code path related to NIR
returning bogus swizzles outside the components of a vector. When doing
an "any", "equal" or "notEqual" operation on vectors, the result of the
instruction "cmp" is predicated using "all4h" or "any4h" depending on
the operation. The results were incorrect for vectors with less than 4
elements because: (1) I was not setting the writemask of the "cmp"
operation according to the vector size and (2) NIR returns bogus
swizzles for components outside the vector. As a consequence, we had
spurious component-wise comparisons that affected the result of the
predication.

As composing the operand swizzle with the identity swizzle-per-size
means to repeat the value of the swizzle of the last vector-element on
the components outside the vector, the error mentioned above does not
happen because to repeat the last (inside-boundaries) component-wise
comparison for the components outside the vector does not affect the
result of the predication.

Other way to fix the bugs above is to set the writemask of the "cmp"
operation to the appropriate mask according to the vector size, thus
forcing the "cmp" instruction to compare only the components inside the
vector boundaries. As the vec4_visitor does not set the writemask of the
"cmp" instruction because it relies on the operand having the correct
swizzle outside the vector boundaries, for consistency with the existing
behaviour, I decided to do the same and compose the swizzle (notice that
the composing is done in the vec4_visitor when visiting an ir_swizzle,
while in NIR we only set the swizzle for the elements inside the
vector). I can remove this function if you prefer setting the writemask
over composing the swizzles. 

I remember another regression related to bogus swizzles in
unpackHalf2x16, for that function the operand is an uint, however in the
vec4_visitor::emit_unpack_half_2x16 function we have:
        tmp_dst.writemask = WRITEMASK_Y;
        emit(SHR(tmp_dst, src0, src_reg(16u)));
This was not buggy in the old vec4_visitor because the src0's swizzle
was correct for the component Y (src0's swizzle would be either XXXX,
the identity swizzle for size == 1, or the result of composing the
swizzle with XXXX in case the parent IR node was an ir_swizzle). In the
case of NIR, as the swizzle can be bogus for all the components except
the first one (size == 1), we need to compose it.