[Mesa-dev] [PATCH] gallivm: disable f16c when not using AVX

Mon Oct 26 08:38:00 PDT 2015

Am 26.10.2015 um 16:33 schrieb Jose Fonseca:
> On 26/10/15 14:58, Roland Scheidegger wrote:
>> Am 26.10.2015 um 10:02 schrieb Jose Fonseca:
>>> On 23/10/15 22:26, sroland at vmware.com wrote:
>>>> From: Roland Scheidegger <sroland at vmware.com>
>>>>
>>>> f16c intrinsic can only be emitted when AVX is used. So when we
>>>> disable AVX
>>>> due to forcing 128bit vectors we must not use this intrinsic
>>>> (depending on
>>>> llvm version, this worked previously because llvm used AVX even when
>>>> we didn't
>>>> tell it to, however I've seen this fail with llvm 3.3 since
>>>> 718249843b915decf8fccec92e466ac1a6219934 which seems to have the side
>>>> effect
>>>> of disabling avx in llvm albeit it only touches sse flags really).
>>>
>>> Good catch.
>>>
>>>> Possibly one day should actually try to use avx even with 128bit
>>>> vectors...
>>>
>>> In the past we needed to override util_cpu_caps.has_avx on AVX capable
>>> machines but where old-JIT code.  But that's no longer the case: the min
>>> supported LLVM version is 3.3, which supports AVX both with MCJIT and
>>> old-JIT.
>>>
>>>
>>> There, the only point of this code is to enable a developer to test SSE2
>>> code paths on a AVX capable machine.
>>>
>>> There's no other reason for someone to go out of his way to override
>>> LP_NATIVE_VECTOR_WIDTH of 256 with 128.
>>>
>>>
>>> So maybe it's worth to make this comment clear: the sole point is to
>>> enable SSE2 testing on AVX machines, and all avx flags, and flags which
>>> depend on avx, need to be masked out.
>>
>> Well that's not quite true. Forcing 128bit wide vectors will get you
>> faster shader compiles and less llvm memory usage. And in some odd cases
>> the compiled shaders aren't even slower. Disabling AVX on top of that
>> doesn't really change much there though things might be minimally slower
>> (of course, if you hit the things which actually depend on avx, like
>> f16c, that's a different story).
> .
> 
>> Though it's not really exposed much as a feature, I find it's atleast
>> as interesting for development to figure out why shaders using 256bit
>> vectors are scaling appropriately or not compared to 128bit rather than
>> SSE2 emulation. And for the former case it makes sense to leave avx
>> enabled. You are right though that the initial idea was to essentially
>> force llvm for the compiled shader to look like it was compiled on a
>> less capable machine (albeit since we're setting the cpu type,
>> instruction scheduling will still be different).
> 
> So it sounds LP_NATIVE_VECTOR_WIDTH is not expressive enough for all
> development test cases.  We probably want another env var to fake SSE2
> machines etc, and let LP_NATIVE_VECTOR_WIDTH to be something orthogonal
> (that will however default to 128/256 based on the machine features).
Yes, ideally it would work like that. That said, since it's really meant
for development mostly, hacking around to what is needed seemed to work
well enough until now...

Roland