[Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.

Matt Turner mattst88 at gmail.com
Tue Nov 3 21:45:43 PST 2015

On Sun, Oct 25, 2015 at 5:10 PM, Siarhei Siamashka
<siarhei.siamashka at gmail.com> wrote:
> On Sun, 25 Oct 2015 13:13:09 -0700
> Matt Turner <mattst88 at gmail.com> wrote:
>> On Sun, Oct 11, 2015 at 8:59 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> > We had lots of hacks to handle the inability to include xmmintrin.h
>> > without compiling with -msse (lest SSE instructions be used in
>> > pixman-mmx.c). Some recent version of gcc relaxed this restriction.
>> >
>> > Change configure.ac to test that xmmintrin.h can be included and that we
>> > can use some intrinsics from it, and remove the work-around code from
>> > pixman-mmx.c.
>> >
>> > Evidently allows gcc 4.9.3 to optimize better as well:
>> >
>> >    text    data     bss     dec     hex filename
>> >  657078   30848     680  688606   a81de libpixman-1.so.0.33.3 before
>> >  656710   30848     680  688238   a806e libpixman-1.so.0.33.3 after
>> >
>> > Signed-off-by: Matt Turner <mattst88 at gmail.com>
>> > ---
>> Ugh. This is apparently not sufficient...
>> https://bugs.gentoo.org/show_bug.cgi?id=564024
>> GCC allows you to *include* xmmintrin.h without enabling SSE, but it
>> still doesn't allow you to use any of the functions:
>> conftest.c: In function ‘main’:
>> /usr/lib/gcc/x86_64-pc-linux-gnu/5.1.0/include/xmmintrin.h:1124:1:
>> error: inlining failed in call to always_inline ‘_mm_mulhi_pu16’:
>> target specific option mismatch
>>  _mm_mulhi_pu16 (__m64 __A, __m64 __B)
>>  ^
>> conftest.c:12:7: error: called from here
>>      w = _mm_mulhi_pu16(w, w);
> Oh, looks like the restriction used to be relaxed for a while, but then
> GCC 4.9 started to be strict again:
>     https://bugzilla.redhat.com/show_bug.cgi?id=1092991#c1
>> I'm not sure what to do except to revert.
> The real problem is that GCC does not provide a separate option for
> MMX2 (a common subset of 3DNOW and SSE). We usually solve compiler
> problems by reporting bugs to compiler developers. This particular
> case had not been handled according to the usual rule, and now
> we have a nice practical demonstration of the consequences ;-)
> BTW, we can still report a bug to GCC. Better late than never.

Yeah, I suppose. The disappointing thing is that Google says an
-m3dnowext flag existed at one point...

>> The MMX but no SSE case is important, at least it was in the past
>> because of OLPC's XO-1.
> I'm not sure how many OLPC XO-1 laptops might be still remaining in
> real use in the hands of real people:
>     http://www.olpcnews.com/about_olpc_news/goodbye_one_laptop_per_child.html
>> Suggestions besides reverting this?
> Because OLPC XO-1 is using the AMD Geode processor, we could probably
> treat the code in pixman-mmx.c as 3dnow optimizations on x86 hardware?

The problem is that -m3dnow isn't sufficient. The instructions we want
to use are a subset of SSE that AMD implemented in the Athlon. We need
an -m3dnowext flag.

We can't pass -march=athlon in MMX_CFLAGS either, since the user is
likely to have specified a -march= value of their own.

> Another option is to start using assembly instead of intrinsics.
> Unless a miracle happens and somebody decides to pay for this job,
> we definitely don't have resources to do a high quality assembly
> implementation for MMX/MMX2. But we still can take the assembly
> output of GCC and tweak it a bit. This is ugly and not very
> maintainable though. Been there, done that with ARMv6.

Not interested.

> Or we could simply do nothing and finally retire MMX support on x86.
> If OLPC XO-1 users still do exist, they can always contact us.

I don't care so much about XO-1, but I do want to retain the ability
to test the MMX code on x86. iwMMXt/loongson systems are slow, and
most development can be done on a fast desktop this way.

More information about the Pixman mailing list