[Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.
Roland Scheidegger
sroland at vmware.com
Tue Dec 6 16:34:29 UTC 2016
Interesting. Can you show the IR / assembly? I don't get any failures here.
I'm wondering if it's trying to use XOP and there's some bug there (or
we're relying on undefined behavior which doesn't happen to work with
it). Albeit since there's not actually any conversion involved in this
case (float 1 channel -> float 4 channel) the assembly here looks
trivial and I can't see how it could go wrong.
I get (with a couple days old llvm):
define void @fetch_r32_float_float(<4 x float>*, i8*, i32, i32, { [2048
x i32], [128 x i64] }*) {
entry:
%5 = getelementptr i8, i8* %1, i32 0
%6 = bitcast i8* %5 to i32*
%7 = load i32, i32* %6
%8 = zext i32 %7 to i128
%9 = bitcast i128 %8 to <4 x float>
%10 = shufflevector <4 x float> %9, <4 x float> <float 0.000000e+00,
float 1.000000e+00, float undef, float undef>, <4 x i32> <i32 0, i32 4,
i32 4, i32 5>
store <4 x float> %10, <4 x float>* %0
ret void
}
fetch_r32_float_float:
0: pushq %rbp
1: movq %rsp, %rbp
4: movl (%rsi), %eax
6: vmovq %rax, %xmm0
11: movabsq $140375561531392, %rax
21: vmovaps (%rax), %xmm1
25: vshufps $0, %xmm1, %xmm0, %xmm0
30: vshufps $72, %xmm1, %xmm0, %xmm0
35: vmovaps %xmm0, (%rdi)
39: popq %rbp
40: retq
The only thing I can think of is maybe the load/zext in combination with
the shuffle going wrong - the shuffle combiner in llvm has a couple xop
cases.
fwiw printing of the values is a bit suboptimal, the "packed" 00 00 80
bf value really is a float 0xbf800000 and you don't see the other
channels at all albeit in this case there aren't any...
Roland
Am 06.12.2016 um 07:27 schrieb Michel Dänzer:
> On 06/12/16 02:39 AM, Tim Rowley wrote:
>> Use llvm provided API based on cpuid rather than our own
>> manually mantained list of mattr enabling/disabling.
>
> This change broke the llvmpipe unit test lp_test_format for me:
>
> Testing PIPE_FORMAT_R32_FLOAT (float) ...
> FAILED
> Packed: 00 00 00 00
> Unpacked (0,0): 1 0 0 1 obtained
> 0 0 0 1 expected
> FAILED
> Packed: 00 00 80 bf
> Unpacked (0,0): 1 0 0 1 obtained
> -1 0 0 1 expected
>
>
> This is on:
>
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 21
> model : 48
> model name : AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
> stepping : 1
> microcode : 0x6003106
> cpu MHz : 4100.000
> cache size : 2048 KB
> physical id : 0
> siblings : 4
> core id : 0
> cpu cores : 2
> apicid : 16
> initial apicid : 0
> fpu : yes
> fpu_exception : yes
> cpuid level : 13
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov
> bugs : fxsave_leak sysret_ss_attrs null_seg
> bogomips : 8200.42
> TLB size : 1536 4K pages
> clflush size : 64
> cache_alignment : 64
> address sizes : 48 bits physical, 48 bits virtual
> power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13]
>
>
>
More information about the mesa-dev
mailing list