[Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.
Roland Scheidegger
sroland at vmware.com
Wed Dec 7 15:02:27 UTC 2016
The bug in llvm has been fixed, can you confirm lp_test_format passes again?
Roland
Am 06.12.2016 um 19:00 schrieb Roland Scheidegger:
> Ok, here is the bug:
> https://llvm.org/bugs/show_bug.cgi?id=31296
>
> Roland
>
> Am 06.12.2016 um 18:47 schrieb Roland Scheidegger:
>> Actually I've verified this quickly with llc.
>> With -mattr=xop, it produces
>>
>> fetch_r32_float_float: # @fetch_r32_float_float
>> .cfi_startproc
>> # BB#0: # %entry
>> vpermilps $65, .LCPI0_0(%rip), %xmm0 # xmm0 = mem[1,0,0,1]
>> vmovaps %xmm0, (%rdi)
>> retq
>>
>> which is very obviously garbage (it even managed to optimize out the
>> actual load, just the constants are left...). So this is a llvm bug with
>> xop indeed. I'm going to a file a bug, but in the interim I don't know
>> what mesa should do - this is one reason why we didn't want to enable
>> features which we didn't actually test previously (that said, if we
>> don't enable them, the llvm bugs we hit will probably never get
>> fixed...). We could of course force-disable xop (albeit in theory it's
>> nice - we really can make use of that damn missing vector shift which
>> otherwise requires avx2).
>>
>> Roland
>>
>>
>> Am 06.12.2016 um 17:34 schrieb Roland Scheidegger:
>>> Interesting. Can you show the IR / assembly? I don't get any failures here.
>>> I'm wondering if it's trying to use XOP and there's some bug there (or
>>> we're relying on undefined behavior which doesn't happen to work with
>>> it). Albeit since there's not actually any conversion involved in this
>>> case (float 1 channel -> float 4 channel) the assembly here looks
>>> trivial and I can't see how it could go wrong.
>>>
>>> I get (with a couple days old llvm):
>>> define void @fetch_r32_float_float(<4 x float>*, i8*, i32, i32, { [2048
>>> x i32], [128 x i64] }*) {
>>> entry:
>>> %5 = getelementptr i8, i8* %1, i32 0
>>> %6 = bitcast i8* %5 to i32*
>>> %7 = load i32, i32* %6
>>> %8 = zext i32 %7 to i128
>>> %9 = bitcast i128 %8 to <4 x float>
>>> %10 = shufflevector <4 x float> %9, <4 x float> <float 0.000000e+00,
>>> float 1.000000e+00, float undef, float undef>, <4 x i32> <i32 0, i32 4,
>>> i32 4, i32 5>
>>> store <4 x float> %10, <4 x float>* %0
>>> ret void
>>> }
>>>
>>> fetch_r32_float_float:
>>> 0: pushq %rbp
>>> 1: movq %rsp, %rbp
>>> 4: movl (%rsi), %eax
>>> 6: vmovq %rax, %xmm0
>>> 11: movabsq $140375561531392, %rax
>>> 21: vmovaps (%rax), %xmm1
>>> 25: vshufps $0, %xmm1, %xmm0, %xmm0
>>> 30: vshufps $72, %xmm1, %xmm0, %xmm0
>>> 35: vmovaps %xmm0, (%rdi)
>>> 39: popq %rbp
>>> 40: retq
>>>
>>> The only thing I can think of is maybe the load/zext in combination with
>>> the shuffle going wrong - the shuffle combiner in llvm has a couple xop
>>> cases.
>>>
>>> fwiw printing of the values is a bit suboptimal, the "packed" 00 00 80
>>> bf value really is a float 0xbf800000 and you don't see the other
>>> channels at all albeit in this case there aren't any...
>>>
>>> Roland
>>>
>>> Am 06.12.2016 um 07:27 schrieb Michel Dänzer:
>>>> On 06/12/16 02:39 AM, Tim Rowley wrote:
>>>>> Use llvm provided API based on cpuid rather than our own
>>>>> manually mantained list of mattr enabling/disabling.
>>>>
>>>> This change broke the llvmpipe unit test lp_test_format for me:
>>>>
>>>> Testing PIPE_FORMAT_R32_FLOAT (float) ...
>>>> FAILED
>>>> Packed: 00 00 00 00
>>>> Unpacked (0,0): 1 0 0 1 obtained
>>>> 0 0 0 1 expected
>>>> FAILED
>>>> Packed: 00 00 80 bf
>>>> Unpacked (0,0): 1 0 0 1 obtained
>>>> -1 0 0 1 expected
>>>>
>>>>
>>>> This is on:
>>>>
>>>> processor : 0
>>>> vendor_id : AuthenticAMD
>>>> cpu family : 21
>>>> model : 48
>>>> model name : AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
>>>> stepping : 1
>>>> microcode : 0x6003106
>>>> cpu MHz : 4100.000
>>>> cache size : 2048 KB
>>>> physical id : 0
>>>> siblings : 4
>>>> core id : 0
>>>> cpu cores : 2
>>>> apicid : 16
>>>> initial apicid : 0
>>>> fpu : yes
>>>> fpu_exception : yes
>>>> cpuid level : 13
>>>> wp : yes
>>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov
>>>> bugs : fxsave_leak sysret_ss_attrs null_seg
>>>> bogomips : 8200.42
>>>> TLB size : 1536 4K pages
>>>> clflush size : 64
>>>> cache_alignment : 64
>>>> address sizes : 48 bits physical, 48 bits virtual
>>>> power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13]
>>>>
>>>>
>>>>
>>>
>>
>
More information about the mesa-dev
mailing list