[xorg-bugzilla-noise] [Bug 1067] New: improve fbmmx

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Aug 13 00:07:14 PDT 2004


Please do not reply to this email: if you want to comment on the bug, go to          
the URL shown below and enter yourcomments there.   
 
https://freedesktop.org/bugzilla/show_bug.cgi?id=1067        
   
           Summary: improve fbmmx
           Product: xorg
           Version: CVS_head
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Server/general
        AssignedTo: xorg-bugzilla-noise at freedesktop.org
        ReportedBy: nmiell at comcast.net
                CC: ajax at nwnk.net,sandmann at daimi.au.dk


Attached is a patch that does the following to the fbmmx implementation:

1. It replaces gcc-specific __builtin_ia32_* functions with the standard MMX and
SSE intrinsic types and functions from mmintrin.h and xmmintrin.h. This makes
the code easier to understand and allows for the possible use of MMX other
x86/x86-64 compilers that don't support gcc's builtin functions.

2. It replaces all inline assembly (except the cpuid use in fbHaveMMX) with
intrinsics. This is easier to understand, compatible with compilers other than
gcc, and (in some cases) results in better code generation. (In other cases,
specifically the use of _mm_cvtsi64_si32 and _mm_cvtsi32_si64, extra
instructions are generated. I expect this to improve with future gcc releases.)

3. Support for fbmmx on AMD64 systems is added. fbHaveMMX() is defined to TRUE
on AMD64 systems. SSE support (in the form of the MMX pshufw instruction) is
also unconditionally enabled on AMD64. SSE remains disabled on i386.

4. The USE_GCC34_MMX macro is renamed to the more general USE_MMX, under the
assumption compilers other than gcc may want to use MMX. A USE_SSE macro is
introduced, which unconditionally enables (i.e. no cpuid test) the above
mentiond pshufw instruction.

5. Access to constants in the MMX data structure are now done using the MC()
macro, which hides away those messy casts. All references to the zero constant
are replaced with _mm_setzero_si64(), which generates "pxor %mm, %mm" and is
recommended instead of constant loads in the AMD optimization manual.

6. The shift function is replaced by open-coded instances of _mm_srli_pi16 and
_mm_slli_pi16.

7. Some comments were added, updated, or reformatted for 80 columns.

This patch has been tested on AMD64 against mharris's xorg-x11-6.7.99.1
packages. rendercheck passes without error and the benchmark from bug #839
reports a 1.2 second overall improvement compared to 6.7.0.        
   
   
--         
Configure bugmail: https://freedesktop.org/bugzilla/userprefs.cgi?tab=email       
   
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the xorg-bugzilla-noise mailing list