[Mesa-dev] [PATCH] draw: handle nan clipdistance

Thu Aug 15 13:32:05 PDT 2013

Am 15.08.2013 20:27, schrieb Zack Rusin:
>> I realize this function isn't used but it looks unnecessarily 
>> complicated - two constants one AND plus one comparison when you
>> could simply do a single comparison (compare x with x with
>> unordered not equal). This is actually doubly bad with AVX because
>> the int comparison is going to use 4 instructions instead of 1
>> (extract/2 cmp/1 insert), well if this runs 8-wide at least.
> 
> I'm going to kill that function, we already have lp_build_isnan that
> does the correct thing.
Ah right. It actually uses OEQ then does a not, I think it would be
easier to understand if it would just use UNE.
I think at some point we might want to optimize these functions a bit,
even if it's really llvm's fault.
Consider this (a trivial "select 1.0 if input is a nan otherwise 0.0")

define <4 x float> @une(<4 x float> %in)
{
entry:
  %0 = fcmp une <4 x float> %in, %in
  %1 = sext <4 x i1> %0 to <4 x i32>
  %2 = and <4 x i32> %1, <i32 1065353216,i32 1065353216,i32
1065353216,i32 1065353216>
  %3 = bitcast <4 x i32> %2 to <4 x float>
  ret <4 x float> %3
}
This generates some neat code:
        vcmpunordps     %xmm0, %xmm0, %xmm0
        vandps  .LCPI0_0(%rip), %xmm0, %xmm0
        ret

(FWIW the OEQ version using a xor generates much the same just vcmpordps
and vandnotps instead).

But now do it with a 1-vec (which we will generally never do, just for
illustration):

define <1 x float> @une(<1 x float> %in)
{
entry:
  %0 = fcmp une <1 x float> %in, %in
  %1 = sext <1 x i1> %0 to <1 x i32>
  %2 = and <1 x i32> %1, <i32 1065353216>
  %3 = bitcast <1 x i32> %2 to <1 x float>
  ret <1 x float> %3
}
Generates not-so-hot-code:
        vucomiss        %xmm0, %xmm0
        setp    %al
        movzbl  %al, %eax
        shll    $31, %eax
        sarl    $31, %eax
        andl    $1065353216, %eax       # imm = 0x3F800000
        vmovd   %eax, %xmm0
        ret

We generally use true scalars instead:
define float @une(float %in)
{
entry:
  %0 = fcmp une float %in, %in
  %1 = sext i1 %0 to i32
  %2 = and i32 %1, 1065353216
  %3 = bitcast i32 %2 to float
  ret float %3
}
Which generates this abomination (don't ask me why <1 x float> vs. float
makes a difference in generated code):
        vucomiss        %xmm0, %xmm0
        setp    %cl
        setne   %al
        orb     %cl, %al
        movl    $-1, %ecx
        testb   %al, %al
        movl    $0, %eax
        cmovnel %ecx, %eax
        andl    $1065353216, %eax       # imm = 0x3F800000
        vmovd   %eax, %xmm0
        ret

So, it looks like for the scalar case using int comparisons would be
definitely better as it should avoid most of that ugliness (but I
couldn't be bothered to write the test case for that).
Now granted my guess is llvm simply should have used the simd unit no
matter which version is used in the first place - nothing forbids it to
use vector instructions for scalars. But no luck unfortunately...
(It is possible newer llvm versions handle this better, didn't check,
but I suspect shit like that could potentially make a performance
difference if these functions are used enough.)

> 
>> Otherwise looks good. Though I'm not sure you really need to kill
>> the prims if the clip distances are infinite?
> 
> The d3d10 spec says "Coordinates coming in to clipping with infinites
> at x, y, z may or may not result in a discarded primitive.". I liked
> handling them the same way as nan, otherwise we're just generating
> pointless primitives. I don't have a strong opinion though, wlk
> doesn't seem to test infinites.

Ok whatever works. I just thought the test may be simpler but maybe not
(given the above...) so that's just fine.

Roland