[Mesa-dev] [PATCH] gallivm: disable f16c when not using AVX

Mon Oct 26 07:58:10 PDT 2015

Am 26.10.2015 um 10:02 schrieb Jose Fonseca:
> On 23/10/15 22:26, sroland at vmware.com wrote:
>> From: Roland Scheidegger <sroland at vmware.com>
>>
>> f16c intrinsic can only be emitted when AVX is used. So when we
>> disable AVX
>> due to forcing 128bit vectors we must not use this intrinsic
>> (depending on
>> llvm version, this worked previously because llvm used AVX even when
>> we didn't
>> tell it to, however I've seen this fail with llvm 3.3 since
>> 718249843b915decf8fccec92e466ac1a6219934 which seems to have the side
>> effect
>> of disabling avx in llvm albeit it only touches sse flags really).
> 
> Good catch.
> 
>> Possibly one day should actually try to use avx even with 128bit
>> vectors...
> 
> In the past we needed to override util_cpu_caps.has_avx on AVX capable
> machines but where old-JIT code.  But that's no longer the case: the min
> supported LLVM version is 3.3, which supports AVX both with MCJIT and
> old-JIT.
> 
> 
> There, the only point of this code is to enable a developer to test SSE2
> code paths on a AVX capable machine.
> 
> There's no other reason for someone to go out of his way to override
> LP_NATIVE_VECTOR_WIDTH of 256 with 128.
> 
> 
> So maybe it's worth to make this comment clear: the sole point is to
> enable SSE2 testing on AVX machines, and all avx flags, and flags which
> depend on avx, need to be masked out.

Well that's not quite true. Forcing 128bit wide vectors will get you
faster shader compiles and less llvm memory usage. And in some odd cases
the compiled shaders aren't even slower. Disabling AVX on top of that
doesn't really change much there though things might be minimally slower
(of course, if you hit the things which actually depend on avx, like
f16c, that's a different story).
Though it's not really exposed much as a feature, I find it's at least
as interesting for development to figure out why shaders using 256bit
vectors are scaling appropriately or not compared to 128bit rather than
SSE2 emulation. And for the former case it makes sense to leave avx
enabled. You are right though that the initial idea was to essentially
force llvm for the compiled shader to look like it was compiled on a
less capable machine (albeit since we're setting the cpu type,
instruction scheduling will still be different).

> 
> 
> BTW the "For simulating less capable machines" code needs to be updated
> too (it's missing has_avx2=0).
Ok I'll add that (plus sse42)...

Roland

> 
> 
>> ---
>>   src/gallium/auxiliary/gallivm/lp_bld_init.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c
>> b/src/gallium/auxiliary/gallivm/lp_bld_init.c
>> index 017d075..e6eede8 100644
>> --- a/src/gallium/auxiliary/gallivm/lp_bld_init.c
>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_init.c
>> @@ -427,6 +427,7 @@ lp_build_init(void)
>>          */
>>         util_cpu_caps.has_avx = 0;
>>         util_cpu_caps.has_avx2 = 0;
>> +      util_cpu_caps.has_f16c = 0;
>>      }
>>
>>   #ifdef PIPE_ARCH_PPC_64
>>
> 
> Reviewed-by: Jose Fonseca <jfonseca at vmware.com>
> 
> Jose