[Mesa-dev] llvmpipe broken on Skylake Pentium (LP_NATIVE_VECTOR_WIDTH=128)

Mon Oct 12 12:27:27 PDT 2015

I'm having some difficulty getting llvmpipe working on a Skylake
Pentium, which has the charming property of not having AVX support at
all (Skylake Cores have AVX2, and Xeons have AVX512, but Pentium seems
to be the new way of spelling Celeron).  Currently I'm trying this with
llvm 3.6.2 and Mesa 10.6.5, but llvm 3.7 doesn't seem to be any better.

The error I'm getting is:

$ DISPLAY=:2 LIBGL_DRIVERS_PATH=`pwd`/lib64/gallium LP_NATIVE_VECTOR_WIDTH=128 /usr/lib64/mesa/gloss
LLVM ERROR: Cannot select: intrinsic %llvm.x86.sse41.pblendvb

(Setting LP_NATIVE_VECTOR_WIDTH like that seems to be effective at
triggering this on Skylake Core, but in the name of paranoia I'm
emulating a Pentium by patching kvm to mask off the AVX bits of
cpuflags: https://ajax.fedorapeople.org/qemu-pseudoskl.patch )

That does indeed seem to be the pblendvb intrinsic from
lp_build_select(), from lp_build_depth_stencil_test(), and at that
point I get (against Xvfb, giving me a z32 depth format):

(gdb) p bld->type
$1 = {floating = 0, fixed = 0, sign = 0, norm = 0, width = 32, length = 4}

There are several other paths through lp_build_select() that look like
they could work, but don't.  If I turn on the if (0)'d vector select
path, I get something like:

LLVM ERROR: Cannot select: 0xc23df0: v4i32 = X86ISD::SMAX 0xc1caf0, 0xc25020 [ORD=103] [ID=189]
  0xc1caf0: v4i32 = X86ISD::VSRL 0xc3a810, 0xc269e0 [ORD=102] [ID=179]
    0xc3a810: v4i32 = bitcast 0xc096b0 [ORD=95] [ID=150]
      0xc096b0: v2i64,ch = X86ISD::VZEXT_LOAD 0xc28160, 0xc1f750<LD8[%sunkaddr145](align=4)> [ORD=95] [ID=140]
        0xc1f750: i64 = add 0xc22560, 0xc21220 [ORD=83] [ID=102]
          0xc22560: i64,ch = CopyFromReg 0xbca1c0, 0xc21880 [ORD=82] [ID=76]
            0xc21880: i64 = Register %vreg26 [ID=28]
          0xc21220: i64 = Constant<232> [ID=29]
    0xc269e0: v4i32 = X86ISD::VZEXT_MOVL 0xc09c00 [ORD=102] [ID=169]
      0xc09c00: v4i32 = scalar_to_vector 0xc24be0 [ORD=102] [ID=160]
        0xc24be0: i32 = truncate 0xc22de0 [ORD=99] [ID=151]
          0xc22de0: i64,ch = load 0xc28160, 0xc22450, 0xc0a9f0<LD4[%sunkaddr154], sext from i32> [ORD=104] [ID=141]
            0xc22450: i64 = add 0xc22560, 0xc09f30 [ORD=97] [ID=100]
              0xc22560: i64,ch = CopyFromReg 0xbca1c0, 0xc21880 [ORD=82] [ID=76]
                0xc21880: i64 = Register %vreg26 [ID=28]
              0xc09f30: i64 = Constant<244> [ID=31]
            0xc0a9f0: i64 = undef [ID=4]
  0xc25020: v4i32 = bitcast 0xc06ec0 [ORD=9] [ID=119]
    0xc06ec0: v2i64,ch = load 0xbca1c0, 0xc05fc0, 0xc0a9f0<LD16[ConstantPool]> [ORD=9] [ID=104]
      0xc05fc0: i64 = X86ISD::Wrapper 0xc21550 [ID=78]
        0xc21550: i64 = TargetConstantPool<<4 x i32> <i32 1, i32 1, i32 1, i32 1>> 0 [ID=47]
      0xc0a9f0: i64 = undef [ID=4]
In function: fs57_variant0_partial

I get the same result for either the BuildTrunc or BuildICmp paths
through the if (0) at the top, and I also get the same result if I just
fall through to lp_build_select_bitwise().

This doesn't seem to be the only breakage.  lp_test_format dies with:

LLVM ERROR: Cannot select: 0x231e090: v4i32 = X86ISD::UMIN 0x2346b70, 0x231bf80 [ORD=5] [ID=30]
  0x2346b70: v4i32 = bitcast 0x2346840 [ORD=3] [ID=29]
    0x2346840: v2i64 = scalar_to_vector 0x2346c80 [ORD=3] [ID=27]
      0x2346c80: i64,ch = load 0x236abd0, 0x231b3d0, 0x231ba30<LD8[%4](align=4)> [ORD=3] [ID=24]
        0x231b3d0: i64,ch = CopyFromReg 0x236abd0, 0x231b2c0 [ORD=1] [ID=20]
          0x231b2c0: i64 = Register %vreg1 [ID=2]
        0x231ba30: i64 = undef [ID=4]
  0x231bf80: v4i32 = bitcast 0x231be70 [ORD=5] [ID=28]
    0x231be70: v2i64,ch = load 0x236abd0, 0x231d920, 0x231ba30<LD16[ConstantPool]> [ORD=5] [ID=25]
      0x231d920: i64 = X86ISD::Wrapper 0x231e3c0 [ID=22]
        0x231e3c0: i64 = TargetConstantPool<<4 x i32> <i32 1, i32 1, i32 1, i32 1>> 0 [ID=14]
      0x231ba30: i64 = undef [ID=4]
In function: fetch_r32g32_uscaled_unorm8

lp_test_arit dies with:

LLVM ERROR: Cannot select: intrinsic %llvm.x86.sse41.round.ps

lp_test_conv dies with:

LLVM ERROR: Cannot select: 0xd79a20: v4i32 = X86ISD::SMIN 0xd796f0, 0xd794d0 [ORD=8] [ID=25]
  0xd796f0: v4i32 = X86ISD::SMAX 0xd79f70, 0xd696b0 [ORD=6] [ID=23]
    0xd79f70: v4i32 = bitcast 0xd698d0 [ORD=4] [ID=21]
      0xd698d0: v2i64,ch = load 0xd43920, 0xd69380, 0xd69050<LD16[%3]> [ORD=4] [ID=17]
        0xd69380: i64 = add 0xd68c10, 0xd69270 [ORD=3] [ID=13]
          0xd68c10: i64,ch = CopyFromReg 0xd43920, 0xd68b00 [ORD=1] [ID=8]
            0xd68b00: i64 = Register %vreg0 [ID=1]
          0xd69270: i64 = Constant<16> [ID=4]
        0xd69050: i64 = undef [ID=3]
    0xd696b0: v4i32 = bitcast 0xd695a0 [ORD=5] [ID=18]
      0xd695a0: v2i64,ch = load 0xd43920, 0xd793c0, 0xd69050<LD16[ConstantPool]> [ORD=5] [ID=14]
        0xd793c0: i64 = X86ISD::Wrapper 0xd7a2a0 [ID=10]
          0xd7a2a0: i64 = TargetConstantPool<<4 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768>> 0 [ID=6]
        0xd69050: i64 = undef [ID=3]
  0xd794d0: v4i32 = bitcast 0xd795e0 [ORD=7] [ID=19]
    0xd795e0: v2i64,ch = load 0xd43920, 0xd79910, 0xd69050<LD16[ConstantPool]> [ORD=7] [ID=15]
      0xd79910: i64 = X86ISD::Wrapper 0xd7a190 [ID=11]
        0xd7a190: i64 = TargetConstantPool<<4 x i32> <i32 32767, i32 32767, i32 32767, i32 32767>> 0 [ID=7]
      0xd69050: i64 = undef [ID=3]
In function: test

All of the above lp_test_* failures can be triggered by setting
LP_NATIVE_VECTOR_WIDTH when running make check so I don't think my kvm
patch is to blame.

I'm a little out of my depth trying to track this down, any ideas?

- ajax