[Mesa-dev] [PATCH 1/5] configure.ac: Detect if running on POWER8 arch
Roland Scheidegger
sroland at vmware.com
Sun Jan 3 11:24:38 PST 2016
Am 03.01.2016 um 19:20 schrieb Oded Gabbay:
> On Sun, Jan 3, 2016 at 8:14 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>> Am 03.01.2016 um 13:50 schrieb Oded Gabbay:
>>> On Thu, Dec 31, 2015 at 8:48 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>>>> Am 31.12.2015 um 10:30 schrieb Oded Gabbay:
>>>>> On Wed, Dec 30, 2015 at 5:41 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>>>>>> Am 30.12.2015 um 10:46 schrieb Oded Gabbay:
>>>>>>> On Wed, Dec 30, 2015 at 1:11 AM, Roland Scheidegger <sroland at vmware.com> wrote:
>>>>>>>>
>>>>>>>> So, if I see that right, you will automatically generate binaries using
>>>>>>>> power8 instructions if compiled on power8 capable box, which then won't
>>>>>>>> run on boxes not supporting power8? Is that really what you want?
>>>>>>>> Maybe some runtime detection would be a good idea (though I don't know
>>>>>>>> if anyone cares about power7)?
>>>>>>>
>>>>>>> The problem is I don't think I can eliminate the build time check
>>>>>>> (although I would very much like to) because I need:
>>>>>>> 1. To pass a special flag to the GCC compiler: -mpower8-vector
>>>>>>> 2. To define _ARCH_PWR8 so GCC will include the newer intrinsic
>>>>>>>
>>>>>>> Without those two things, I won't be able to use vec_vgbbd which I
>>>>>>> need to implement the _mm_movemask_epi8 efficiently, and without that,
>>>>>>> all this patch series can be thrown out the window. The emulation of
>>>>>>> _mm_movemask_epi8 using regular instructions is just horrible.
>>>>>>>
>>>>>>> You are correct that once you build a binary with this flag on power8
>>>>>>> machine, that binary won't run on power7 machine. You get "cannot
>>>>>>> execute binary file"
>>>>>>>
>>>>>>> Unfortunately, I don't see a way around this because even if I
>>>>>>> condition the use of vec_vgbbd on a runtime check/define, the library
>>>>>>> still won't be executable because it was built with -mpower8-vector.
>>>>>>>
>>>>>>> Having said that, because I *assume* IBM right now mostly cares about
>>>>>>> Linux running on POWER8 with little-endian, I think it is a fair
>>>>>>> compromise.
>>>>>>
>>>>>> Note I don't have anything against a build time check. My concern here
>>>>>> is something along the lines of unsuspecting distros shipping binaries
>>>>>> which won't work, as it looks to me like this will get picked up
>>>>>> automatically. That is different to how for instance sse41 is handled.
>>>>>> That is I believe this should only get enabled if someone has specified
>>>>>> some -mcpu=power8 or whatever flag explicitly somewhere already.
>>>>>>
>>>>>> Roland
>>>>>
>>>>> I understand and I share your concern. Maybe we should add
>>>>> "--disable-pwr8-inst" to mesa's configure ? if that flag is given to
>>>>> configure, it would disable the optimization code (won't add
>>>>> _ARCH_PWR8 to defines and won't add -mpower8-vector to gcc flags).
>>>>>
>>>>> What do you think ?
>>>> If the generated code with all automatically picked up compile options
>>>> really doesn't run on power7 just because of this, I think it would be
>>>> nicer if this were an explicit enable.
>>>>
>>>> Roland
>>>>
>>>
>>> So the problem is a bit worse then that and requires a harsher solution.
>>> Apparently, when that compiler flag (power8-vector) is given to GCC,
>>> GCC uses POWER8-only instructions in other places as well! What I have
>>> seen so far, is that it uses such instructions in the implementation
>>> of exp2() and/or log2() (in f.cpp) and also saw it in
>>> __glXInitializeVisualConfigFromTags(). The instructions used are not
>>> vector instructions, but floating point instructions, which were added
>>> only in PowerISA 2.07
>>>
>>> Therefore, I think that for now, I will limit the entire optimization
>>> code to POWER8 *and* Little-Endian. Because ppc64le packages can
>>> *only* run on POWER8 systems, and because you can't transfer binaries
>>> between LE and BE machines, this workaround eliminates the danger of
>>> crashing on "illegal instruction". In addition, there is no more need
>>> for runtime checks.
>>>
>>> I hope you agree that with this change, it is better to enable the
>>> power8-vector by default when building on POWER8 machine installed
>>> with Linux LE. For all other archs it will be disabled by default.
>>
>> Yes, that looks reasonable.
>
> Thanks.
>
>>
>>> I will try to contact IBM GCC devs to see how we can overcome this
>>> problem (or if they even care) so I can expand these optimizations to
>>> BE as well.
>> IIRC this is problematic for things like sse41 etc. as well, it is just
>> how gcc works. I think the typical workaround is to move code using
>> intrinsics which need special compile flags to their own file (or rather
>> compile unit), which you then can compile with those flags. (This
>> implies of course separate functions, that is you can't have any
>> run-time check plus the assembly in the same file or function.)
>> This is how mesa handles _mesa_streaming_load_memcpy for the i965 driver.
>>
>> Roland
>>
> Yeah, I imagined as much, but I didn't know this technique was already
> in use in mesa.
>
> I hate this fragmentation of code, and I think a better solution, at
> the compiler level, is to have some kind of flag which tells the
> compiler he can accept power8/sse41 inline assembly and/or intrinsics
> but can't generate those on his own (I'm not a compiler expert but I
> think that's doable).
Yes, this is what would be nice. Works that way in some other compilers
as well. I suppose there's technical reasons why that doesn't work with gcc.
The other solution is to use straight assembly instead of intrinsics
(the _mm_shuffle_epi8 definition in u_sse.h does just that), but that
might not really be feasible here...
> As I said, I'll try to dig around and see what can be done.
Well if you don't care about power8 BE you could just leave it at that.
Or just pick it up if something added the power8 compile flag already.
That way you don't need a runtime test neither.
Roland
>
> Oded
>
>>
>>>
>>> I will send the revised patches shortly.
>>>
>>> Oded
>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Oded
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Oded
>>>>>>>
>>>>>>>> So far we didn't bother with that for SSE
>>>>>>>> but it has to be said SSE2 is a really low bar (and the manual assembly
>>>>>>>> stuff doesn't use anything more advanced, even though clearly things
>>>>>>>> like the emulated mm_mullo_epi32 are suboptimal if your cpu supports
>>>>>>>> sse41). And even then on non-x86 you actually might not get
>>>>>>>> PIPE_ARCH_SSE if you didn't set gcc's compile flags accordingly.
>>>>>>>>
>>>>>>>> Roland
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 29.12.2015 um 17:12 schrieb Oded Gabbay:
>>>>>>>>> To determine if we could use special POWER8 assembly directives, we first
>>>>>>>>> need to detect whether we are running on POWER8 architecture. This patch
>>>>>>>>> adds this detection to configure.ac and adds the necessary compilation
>>>>>>>>> flags accordingly.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Oded Gabbay <oded.gabbay at gmail.com>
>>>>>>>>> ---
>>>>>>>>> configure.ac | 30 ++++++++++++++++++++++++++++++
>>>>>>>>> 1 file changed, 30 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/configure.ac b/configure.ac
>>>>>>>>> index f8a70be..1acd47e 100644
>>>>>>>>> --- a/configure.ac
>>>>>>>>> +++ b/configure.ac
>>>>>>>>> @@ -396,6 +396,36 @@ fi
>>>>>>>>> AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1])
>>>>>>>>> AC_SUBST([SSE41_CFLAGS], $SSE41_CFLAGS)
>>>>>>>>>
>>>>>>>>> +dnl Check for POWER8 Architecture
>>>>>>>>> +PWR8_CFLAGS="-mpower8-vector"
>>>>>>>>> +have_pwr8_intrinsics=no
>>>>>>>>> +AC_MSG_CHECKING(whether we are running on POWER8 Architecture)
>>>>>>>>> +save_CFLAGS=$CFLAGS
>>>>>>>>> +CFLAGS="$PWR8_CFLAGS $CFLAGS"
>>>>>>>>> +AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
>>>>>>>>> +#if defined(__GNUC__) && (__GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 8))
>>>>>>>>> +#error "Need GCC >= 4.8 for sane POWER8 support"
>>>>>>>>> +#endif
>>>>>>>>> +#include <altivec.h>
>>>>>>>>> +int main () {
>>>>>>>>> + vector unsigned char r;
>>>>>>>>> + vector unsigned int v = vec_splat_u32 (1);
>>>>>>>>> + r = __builtin_vec_vgbbd ((vector unsigned char) v);
>>>>>>>>> + return 0;
>>>>>>>>> +}]])], have_pwr8_intrinsics=yes)
>>>>>>>>> +CFLAGS=$save_CFLAGS
>>>>>>>>> +
>>>>>>>>> +if test $have_pwr8_intrinsics = yes ; then
>>>>>>>>> + DEFINES="$DEFINES -D_ARCH_PWR8"
>>>>>>>>> + CXXFLAGS="$CXXFLAGS $PWR8_CFLAGS"
>>>>>>>>> + CFLAGS="$CFLAGS $PWR8_CFLAGS"
>>>>>>>>> +else
>>>>>>>>> + PWR8_CFLAGS=
>>>>>>>>> +fi
>>>>>>>>> +
>>>>>>>>> +AC_MSG_RESULT($have_pwr8_intrinsics)
>>>>>>>>> +AC_SUBST([PWR8_CFLAGS], $PWR8_CFLAGS)
>>>>>>>>> +
>>>>>>>>> dnl Can't have static and shared libraries, default to static if user
>>>>>>>>> dnl explicitly requested. If both disabled, set to static since shared
>>>>>>>>> dnl was explicitly requested.
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
More information about the mesa-dev
mailing list