[Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.
siarhei.siamashka at gmail.com
Sun Oct 25 17:41:48 PDT 2015
On Mon, 26 Oct 2015 02:10:39 +0200
Siarhei Siamashka <siarhei.siamashka at gmail.com> wrote:
> On Sun, 25 Oct 2015 13:13:09 -0700
> Matt Turner <mattst88 at gmail.com> wrote:
> > On Sun, Oct 11, 2015 at 8:59 PM, Matt Turner <mattst88 at gmail.com> wrote:
> > > We had lots of hacks to handle the inability to include xmmintrin.h
> > > without compiling with -msse (lest SSE instructions be used in
> > > pixman-mmx.c). Some recent version of gcc relaxed this restriction.
> > >
> > > Change configure.ac to test that xmmintrin.h can be included and that we
> > > can use some intrinsics from it, and remove the work-around code from
> > > pixman-mmx.c.
> > >
> > > Evidently allows gcc 4.9.3 to optimize better as well:
> > >
> > > text data bss dec hex filename
> > > 657078 30848 680 688606 a81de libpixman-1.so.0.33.3 before
> > > 656710 30848 680 688238 a806e libpixman-1.so.0.33.3 after
> > >
> > > Signed-off-by: Matt Turner <mattst88 at gmail.com>
> > > ---
> > Ugh. This is apparently not sufficient...
> > https://bugs.gentoo.org/show_bug.cgi?id=564024
> > GCC allows you to *include* xmmintrin.h without enabling SSE, but it
> > still doesn't allow you to use any of the functions:
> > conftest.c: In function ‘main’:
> > /usr/lib/gcc/x86_64-pc-linux-gnu/5.1.0/include/xmmintrin.h:1124:1:
> > error: inlining failed in call to always_inline ‘_mm_mulhi_pu16’:
> > target specific option mismatch
> > _mm_mulhi_pu16 (__m64 __A, __m64 __B)
> > ^
> > conftest.c:12:7: error: called from here
> > w = _mm_mulhi_pu16(w, w);
> Oh, looks like the restriction used to be relaxed for a while, but then
> GCC 4.9 started to be strict again:
> > I'm not sure what to do except to revert.
> The real problem is that GCC does not provide a separate option for
> MMX2 (a common subset of 3DNOW and SSE). We usually solve compiler
> problems by reporting bugs to compiler developers. This particular
> case had not been handled according to the usual rule, and now
> we have a nice practical demonstration of the consequences ;-)
> BTW, we can still report a bug to GCC. Better late than never.
> > The MMX but no SSE case is important, at least it was in the past
> > because of OLPC's XO-1.
> I'm not sure how many OLPC XO-1 laptops might be still remaining in
> real use in the hands of real people:
> > Suggestions besides reverting this?
> Because OLPC XO-1 is using the AMD Geode processor, we could probably
> treat the code in pixman-mmx.c as 3dnow optimizations on x86 hardware?
A variation of this would be to transform 'pixman-mmx.c' into a common
header file with the ability to add custom function name prefixes
with the help of the C preprocessor. Then introduce 'pixman-3dnow.c'
and 'pixman-sse1.c' files, which would just define different prefixes
and include this common template. The former can be compiled with
'-m3dnow', the latter can be compiled with '-msse'.
The obvious drawback would be the pixman library size increase. But
GCC is going to be happy. The other compilers, such as Clang, should
be happy too.
> Another option is to start using assembly instead of intrinsics.
> Unless a miracle happens and somebody decides to pay for this job,
> we definitely don't have resources to do a high quality assembly
> implementation for MMX/MMX2. But we still can take the assembly
> output of GCC and tweak it a bit. This is ugly and not very
> maintainable though. Been there, done that with ARMv6.
> Or we could simply do nothing and finally retire MMX support on x86.
> If OLPC XO-1 users still do exist, they can always contact us.
More information about the Pixman