[xorg-bugzilla-noise] [Bug 1067] New: improve fbmmx
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Fri Aug 13 00:07:14 PDT 2004
Please do not reply to this email: if you want to comment on the bug, go to
the URL shown below and enter yourcomments there.
https://freedesktop.org/bugzilla/show_bug.cgi?id=1067
Summary: improve fbmmx
Product: xorg
Version: CVS_head
Platform: PC
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Server/general
AssignedTo: xorg-bugzilla-noise at freedesktop.org
ReportedBy: nmiell at comcast.net
CC: ajax at nwnk.net,sandmann at daimi.au.dk
Attached is a patch that does the following to the fbmmx implementation:
1. It replaces gcc-specific __builtin_ia32_* functions with the standard MMX and
SSE intrinsic types and functions from mmintrin.h and xmmintrin.h. This makes
the code easier to understand and allows for the possible use of MMX other
x86/x86-64 compilers that don't support gcc's builtin functions.
2. It replaces all inline assembly (except the cpuid use in fbHaveMMX) with
intrinsics. This is easier to understand, compatible with compilers other than
gcc, and (in some cases) results in better code generation. (In other cases,
specifically the use of _mm_cvtsi64_si32 and _mm_cvtsi32_si64, extra
instructions are generated. I expect this to improve with future gcc releases.)
3. Support for fbmmx on AMD64 systems is added. fbHaveMMX() is defined to TRUE
on AMD64 systems. SSE support (in the form of the MMX pshufw instruction) is
also unconditionally enabled on AMD64. SSE remains disabled on i386.
4. The USE_GCC34_MMX macro is renamed to the more general USE_MMX, under the
assumption compilers other than gcc may want to use MMX. A USE_SSE macro is
introduced, which unconditionally enables (i.e. no cpuid test) the above
mentiond pshufw instruction.
5. Access to constants in the MMX data structure are now done using the MC()
macro, which hides away those messy casts. All references to the zero constant
are replaced with _mm_setzero_si64(), which generates "pxor %mm, %mm" and is
recommended instead of constant loads in the AMD optimization manual.
6. The shift function is replaced by open-coded instances of _mm_srli_pi16 and
_mm_slli_pi16.
7. Some comments were added, updated, or reformatted for 80 columns.
This patch has been tested on AMD64 against mharris's xorg-x11-6.7.99.1
packages. rendercheck passes without error and the benchmark from bug #839
reports a 1.2 second overall improvement compared to 6.7.0.
--
Configure bugmail: https://freedesktop.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the xorg-bugzilla-noise
mailing list