[Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.
mattst88 at gmail.com
Tue Nov 3 21:45:43 PST 2015
On Sun, Oct 25, 2015 at 5:10 PM, Siarhei Siamashka
<siarhei.siamashka at gmail.com> wrote:
> On Sun, 25 Oct 2015 13:13:09 -0700
> Matt Turner <mattst88 at gmail.com> wrote:
>> On Sun, Oct 11, 2015 at 8:59 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> > We had lots of hacks to handle the inability to include xmmintrin.h
>> > without compiling with -msse (lest SSE instructions be used in
>> > pixman-mmx.c). Some recent version of gcc relaxed this restriction.
>> > Change configure.ac to test that xmmintrin.h can be included and that we
>> > can use some intrinsics from it, and remove the work-around code from
>> > pixman-mmx.c.
>> > Evidently allows gcc 4.9.3 to optimize better as well:
>> > text data bss dec hex filename
>> > 657078 30848 680 688606 a81de libpixman-1.so.0.33.3 before
>> > 656710 30848 680 688238 a806e libpixman-1.so.0.33.3 after
>> > Signed-off-by: Matt Turner <mattst88 at gmail.com>
>> > ---
>> Ugh. This is apparently not sufficient...
>> GCC allows you to *include* xmmintrin.h without enabling SSE, but it
>> still doesn't allow you to use any of the functions:
>> conftest.c: In function ‘main’:
>> error: inlining failed in call to always_inline ‘_mm_mulhi_pu16’:
>> target specific option mismatch
>> _mm_mulhi_pu16 (__m64 __A, __m64 __B)
>> conftest.c:12:7: error: called from here
>> w = _mm_mulhi_pu16(w, w);
> Oh, looks like the restriction used to be relaxed for a while, but then
> GCC 4.9 started to be strict again:
>> I'm not sure what to do except to revert.
> The real problem is that GCC does not provide a separate option for
> MMX2 (a common subset of 3DNOW and SSE). We usually solve compiler
> problems by reporting bugs to compiler developers. This particular
> case had not been handled according to the usual rule, and now
> we have a nice practical demonstration of the consequences ;-)
> BTW, we can still report a bug to GCC. Better late than never.
Yeah, I suppose. The disappointing thing is that Google says an
-m3dnowext flag existed at one point...
>> The MMX but no SSE case is important, at least it was in the past
>> because of OLPC's XO-1.
> I'm not sure how many OLPC XO-1 laptops might be still remaining in
> real use in the hands of real people:
>> Suggestions besides reverting this?
> Because OLPC XO-1 is using the AMD Geode processor, we could probably
> treat the code in pixman-mmx.c as 3dnow optimizations on x86 hardware?
The problem is that -m3dnow isn't sufficient. The instructions we want
to use are a subset of SSE that AMD implemented in the Athlon. We need
an -m3dnowext flag.
We can't pass -march=athlon in MMX_CFLAGS either, since the user is
likely to have specified a -march= value of their own.
> Another option is to start using assembly instead of intrinsics.
> Unless a miracle happens and somebody decides to pay for this job,
> we definitely don't have resources to do a high quality assembly
> implementation for MMX/MMX2. But we still can take the assembly
> output of GCC and tweak it a bit. This is ugly and not very
> maintainable though. Been there, done that with ARMv6.
> Or we could simply do nothing and finally retire MMX support on x86.
> If OLPC XO-1 users still do exist, they can always contact us.
I don't care so much about XO-1, but I do want to retain the ability
to test the MMX code on x86. iwMMXt/loongson systems are slow, and
most development can be done on a fast desktop this way.
More information about the Pixman