[Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.
Roland Scheidegger
sroland at vmware.com
Tue Dec 6 18:00:32 UTC 2016
Ok, here is the bug:
https://llvm.org/bugs/show_bug.cgi?id=31296
Roland
Am 06.12.2016 um 18:47 schrieb Roland Scheidegger:
> Actually I've verified this quickly with llc.
> With -mattr=xop, it produces
>
> fetch_r32_float_float: # @fetch_r32_float_float
> .cfi_startproc
> # BB#0: # %entry
> vpermilps $65, .LCPI0_0(%rip), %xmm0 # xmm0 = mem[1,0,0,1]
> vmovaps %xmm0, (%rdi)
> retq
>
> which is very obviously garbage (it even managed to optimize out the
> actual load, just the constants are left...). So this is a llvm bug with
> xop indeed. I'm going to a file a bug, but in the interim I don't know
> what mesa should do - this is one reason why we didn't want to enable
> features which we didn't actually test previously (that said, if we
> don't enable them, the llvm bugs we hit will probably never get
> fixed...). We could of course force-disable xop (albeit in theory it's
> nice - we really can make use of that damn missing vector shift which
> otherwise requires avx2).
>
> Roland
>
>
> Am 06.12.2016 um 17:34 schrieb Roland Scheidegger:
>> Interesting. Can you show the IR / assembly? I don't get any failures here.
>> I'm wondering if it's trying to use XOP and there's some bug there (or
>> we're relying on undefined behavior which doesn't happen to work with
>> it). Albeit since there's not actually any conversion involved in this
>> case (float 1 channel -> float 4 channel) the assembly here looks
>> trivial and I can't see how it could go wrong.
>>
>> I get (with a couple days old llvm):
>> define void @fetch_r32_float_float(<4 x float>*, i8*, i32, i32, { [2048
>> x i32], [128 x i64] }*) {
>> entry:
>> %5 = getelementptr i8, i8* %1, i32 0
>> %6 = bitcast i8* %5 to i32*
>> %7 = load i32, i32* %6
>> %8 = zext i32 %7 to i128
>> %9 = bitcast i128 %8 to <4 x float>
>> %10 = shufflevector <4 x float> %9, <4 x float> <float 0.000000e+00,
>> float 1.000000e+00, float undef, float undef>, <4 x i32> <i32 0, i32 4,
>> i32 4, i32 5>
>> store <4 x float> %10, <4 x float>* %0
>> ret void
>> }
>>
>> fetch_r32_float_float:
>> 0: pushq %rbp
>> 1: movq %rsp, %rbp
>> 4: movl (%rsi), %eax
>> 6: vmovq %rax, %xmm0
>> 11: movabsq $140375561531392, %rax
>> 21: vmovaps (%rax), %xmm1
>> 25: vshufps $0, %xmm1, %xmm0, %xmm0
>> 30: vshufps $72, %xmm1, %xmm0, %xmm0
>> 35: vmovaps %xmm0, (%rdi)
>> 39: popq %rbp
>> 40: retq
>>
>> The only thing I can think of is maybe the load/zext in combination with
>> the shuffle going wrong - the shuffle combiner in llvm has a couple xop
>> cases.
>>
>> fwiw printing of the values is a bit suboptimal, the "packed" 00 00 80
>> bf value really is a float 0xbf800000 and you don't see the other
>> channels at all albeit in this case there aren't any...
>>
>> Roland
>>
>> Am 06.12.2016 um 07:27 schrieb Michel Dänzer:
>>> On 06/12/16 02:39 AM, Tim Rowley wrote:
>>>> Use llvm provided API based on cpuid rather than our own
>>>> manually mantained list of mattr enabling/disabling.
>>>
>>> This change broke the llvmpipe unit test lp_test_format for me:
>>>
>>> Testing PIPE_FORMAT_R32_FLOAT (float) ...
>>> FAILED
>>> Packed: 00 00 00 00
>>> Unpacked (0,0): 1 0 0 1 obtained
>>> 0 0 0 1 expected
>>> FAILED
>>> Packed: 00 00 80 bf
>>> Unpacked (0,0): 1 0 0 1 obtained
>>> -1 0 0 1 expected
>>>
>>>
>>> This is on:
>>>
>>> processor : 0
>>> vendor_id : AuthenticAMD
>>> cpu family : 21
>>> model : 48
>>> model name : AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
>>> stepping : 1
>>> microcode : 0x6003106
>>> cpu MHz : 4100.000
>>> cache size : 2048 KB
>>> physical id : 0
>>> siblings : 4
>>> core id : 0
>>> cpu cores : 2
>>> apicid : 16
>>> initial apicid : 0
>>> fpu : yes
>>> fpu_exception : yes
>>> cpuid level : 13
>>> wp : yes
>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov
>>> bugs : fxsave_leak sysret_ss_attrs null_seg
>>> bogomips : 8200.42
>>> TLB size : 1536 4K pages
>>> clflush size : 64
>>> cache_alignment : 64
>>> address sizes : 48 bits physical, 48 bits virtual
>>> power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13]
>>>
>>>
>>>
>>
>
More information about the mesa-dev
mailing list