[xorg-bugzilla-noise] [Bug 839] New: Speeding up render with gcc 3.4 and MMX intrinsics

bugzilla-daemon at pdx.freedesktop.org bugzilla-daemon at pdx.freedesktop.org
Thu Jul 8 09:45:22 PDT 2004


Please do not reply to this email: if you want to comment on the bug, go to     
the URL shown below and enter your comments there.  
  
http://freedesktop.org/bugzilla/show_bug.cgi?id=839   
   
           Summary: Speeding up render with gcc 3.4 and MMX intrinsics
           Product: xorg
           Version: CVS_head
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Server/general
        AssignedTo: xorg-bugzilla-noise at freedesktop.org
        ReportedBy: sandmann at daimi.au.dk


Attached is a patch that speeds up some operations in the Render
extension by using gcc 3.4 MMX intrinsics. A benchmark rendering a
paragraph of component alpha text to a pixmap gave these results on a
1200 MHz laptop with an i830 chip running Fedora Core I:

Unmodified X server and the pixmap in system RAM:

        [ssp at localhost x]$ ./a.out
        total time: 41.394618
        average rect time: 0.683200
        worst rect: 9
        average glyph time: 3.550500

with the MMX optimizations:

        [ssp at localhost x]$ ./a.out
        total time: 22.972553
        average rect time: 0.677900
        worst rect: 9
        average glyph time: 1.692000

Ie., text rendering is more than twice as fast. The 'average glyph
time' here is the time it takes to render the entire paragraph of
text.

With the pixmap in video RAM, the speedup is not quite as
spectacular:

Unmodified X server:

        [ssp at localhost x]$ ./a.out
        total time: 95.900768
        average rect time: 0.003300
        worst rect: 1
        average glyph time: 9.693500

With MMXified compositing:

        total time: 66.559287
        average rect time: 0.015100
        worst rect: 6
        average glyph time: 6.720500

But still a nice improvement. The patch includes improved code for
these cases:

Subpixel text:
- (constant color) in (component alpha mask) over 565 destination
- (constant color) in (component alpha mask) over 32bit destination
- (32 bit component alpha) Saturate (32 bit destination)

Regular antialiased text:
- (8 bit alpha) Saturate (8 bit destination)
- (constant color) in (8 bit alpha mask) over 565 destination
- (constant color) in (8 bit alpha mask) over 32bit destination

GdkPixbuf:
- (reversed, non-premultiplied source) over 32bit destination
- (reversed, non-premultiplied source) over 565 destination

Alpha rectangle (e.g., Nautilus selection rectangle):
- (constant color) over 32bit destination
- (constant color) over 565 destination

Solid fill
- solid fill of 32 bit drawable
- solid fill of 16 bit drawable

The code can optionally be compiled to use the pshufw instruction, which
is only available on pentium III. 


One question: The patch has a bad hack where it redefines
DefaultCCOptions for all of the framebuffer code. How should this be
done properly? The problem with the existing DefaultCCOptions is that
they include -pedantic which doesn't work with the MMX intrinsics.

Attaching the patch, two new files and the benchmark.   
   
--    
Configure bugmail: http://freedesktop.org/bugzilla/userprefs.cgi?tab=email   
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the xorg-bugzilla-noise mailing list