[Mesa-dev] [PATCH] gallivm: disable f16c when not using AVX

Mon Oct 26 08:33:22 PDT 2015

On 26/10/15 14:58, Roland Scheidegger wrote:
> Am 26.10.2015 um 10:02 schrieb Jose Fonseca:
>> On 23/10/15 22:26, sroland at vmware.com wrote:
>>> From: Roland Scheidegger <sroland at vmware.com>
>>>
>>> f16c intrinsic can only be emitted when AVX is used. So when we
>>> disable AVX
>>> due to forcing 128bit vectors we must not use this intrinsic
>>> (depending on
>>> llvm version, this worked previously because llvm used AVX even when
>>> we didn't
>>> tell it to, however I've seen this fail with llvm 3.3 since
>>> 718249843b915decf8fccec92e466ac1a6219934 which seems to have the side
>>> effect
>>> of disabling avx in llvm albeit it only touches sse flags really).
>>
>> Good catch.
>>
>>> Possibly one day should actually try to use avx even with 128bit
>>> vectors...
>>
>> In the past we needed to override util_cpu_caps.has_avx on AVX capable
>> machines but where old-JIT code.  But that's no longer the case: the min
>> supported LLVM version is 3.3, which supports AVX both with MCJIT and
>> old-JIT.
>>
>>
>> There, the only point of this code is to enable a developer to test SSE2
>> code paths on a AVX capable machine.
>>
>> There's no other reason for someone to go out of his way to override
>> LP_NATIVE_VECTOR_WIDTH of 256 with 128.
>>
>>
>> So maybe it's worth to make this comment clear: the sole point is to
>> enable SSE2 testing on AVX machines, and all avx flags, and flags which
>> depend on avx, need to be masked out.
>
> Well that's not quite true. Forcing 128bit wide vectors will get you
> faster shader compiles and less llvm memory usage. And in some odd cases
> the compiled shaders aren't even slower. Disabling AVX on top of that
> doesn't really change much there though things might be minimally slower
> (of course, if you hit the things which actually depend on avx, like
> f16c, that's a different story).
.

> Though it's not really exposed much as a feature, I find it's atleast
> as interesting for development to figure out why shaders using 256bit
> vectors are scaling appropriately or not compared to 128bit rather than
> SSE2 emulation. And for the former case it makes sense to leave avx
> enabled. You are right though that the initial idea was to essentially
> force llvm for the compiled shader to look like it was compiled on a
> less capable machine (albeit since we're setting the cpu type,
> instruction scheduling will still be different).

So it sounds LP_NATIVE_VECTOR_WIDTH is not expressive enough for all 
development test cases.  We probably want another env var to fake SSE2 
machines etc, and let LP_NATIVE_VECTOR_WIDTH to be something orthogonal 
(that will however default to 128/256 based on the machine features).

Jose