[Mesa-dev] [Bug 52209] lp_test_format regression

Tue Jul 17 23:25:56 PDT 2012

https://bugs.freedesktop.org/show_bug.cgi?id=52209

--- Comment #7 from Roland Scheidegger <sroland at vmware.com> 2012-07-17 23:25:56 PDT ---
Since the test doesn't use any sized vectors depending on cpu caps
LP_NATIVE_VECTOR_WIDTH shouldn't affect anything.
Here's the IR of a test which fails:
define void @fetch_r32g32_sscaled_unorm8(<4 x i8>*, i8*, i32, i32) {
entry:
  %4 = bitcast i8* %1 to <2 x i32>*
  %5 = load <2 x i32>* %4, align 4
  %6 = shufflevector <2 x i32> %5, <2 x i32> undef, <4 x i32> <i32 0, i32 1,
i32 2, i32 2>
  %7 = call <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32> %6, <4 x i32>
zeroinitializer)
  %8 = call <4 x i32> @llvm.x86.sse41.pminsd(<4 x i32> %7, <4 x i32> <i32 1,
i32 1, i32 1, i32 1>)
  %9 = ashr <4 x i32> %8, <i32 -1, i32 -1, i32 -1, i32 -1>
  %10 = sub <4 x i32> %8, %9
  %11 = extractelement <4 x i32> %10, i32 0
  %12 = extractelement <4 x i32> %10, i32 1
  %13 = extractelement <4 x i32> %10, i32 2
  %14 = extractelement <4 x i32> %10, i32 3
  %15 = bitcast i32 %11 to <2 x i16>
  %16 = bitcast i32 %12 to <2 x i16>
  %17 = shufflevector <2 x i16> %15, <2 x i16> %16, <2 x i32> <i32 0, i32 2>
  %18 = bitcast i32 %13 to <2 x i16>
  %19 = bitcast i32 %14 to <2 x i16>
  %20 = shufflevector <2 x i16> %18, <2 x i16> %19, <2 x i32> <i32 0, i32 2>
  %21 = bitcast <2 x i16> %17 to <4 x i8>
  %22 = bitcast <2 x i16> %20 to <4 x i8>
  %23 = shufflevector <4 x i8> %21, <4 x i8> %22, <4 x i32> <i32 0, i32 2, i32
4, i32 6>
  %24 = shl <4 x i8> %23, <i8 8, i8 8, i8 8, i8 8>
  %25 = sub <4 x i8> %24, %23
  %26 = bitcast <4 x i8> %25 to i32
  %27 = and i32 %26, 65535
  %28 = or i32 bitcast (<4 x i8> <i8 0, i8 0, i8 0, i8 -1> to i32), %27
  %29 = bitcast i32 %28 to <4 x i8>
  store <4 x i8> %29, <4 x i8>* %0
  ret void
}

With llvm 3.1 it passes but not with 2.9/3.0.
But there's more to it, with 2.9 AND a cpu which isn't sse41-capable it also
passes (and on top of it the code generated is way _better_ despite it can't
use the pminsd/pmaxsd intrinsics but those aren't the issue).
So with sse41 or avx capable cpu llvm 3.1 generates correct but crappy code,
whereas it is crappy and wrong with 2.9/3.0. Only if you have a not-sse41
capable cpu it produces correct and good code...
I believe the issue here is use of the non-native vectors toward the end (2x16,
4x8) since llvm uses padded vector elements for them (a 4xi8 vector looks like
4xi32) so it has to do lots of weird shuffles (those harmless looking bitcasts
cause lots of unpacks, shuffles etc.). Well that's the explanation for the
crappy code (probably some optimization wasn't available without sse41 which
turned out to be much better in the end). Fortunately it shouldn't happen with
llvmpipe since we don't generally use such vectors (we always fetch multiple of
4 values). This doesn't explain why it isn't correct though. Maybe we're
relying somewhere on some properties of those values when resizing which don't
hold true if the vector elements aren't packed but padded.
There's another issue with this code, which may or may not be related to this
bug:
  %9 = ashr <4 x i32> %8, <i32 -1, i32 -1, i32 -1, i32 -1>
(the uscaled formats will have a lshr instead).
This shuffle is illegal since shuffles with values larger or equal than vector
width (which this is) are undefined in llvm (ok not illegal just the result is
undefined). However, llvm itself doesn't care and with sse2 it just happily
issues the psrad 255 instruction, which has defined (and reasonable) behavior
(for the non-vector domain the hardware will just use the last count bits which
would still work). This comes from lp_build_conv(), line 594 (since src_shift
is zero, and src_offset is 0 and dst_offset is 1). So something seems wrong
with this calculation, maybe we'd need to do something different if destination
is normalized format instead.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.