[Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit

Fri Nov 11 07:05:34 PST 2011

On Fre, 2011-11-11 at 06:52 -0800, Jose Fonseca wrote: 
> 
> ----- Original Message -----
> > 
> > Am Freitag, 11. November 2011 14:33 CET, Michel Dänzer
> > <michel at daenzer.net> schrieb:
> >  
> > > On Fre, 2011-11-11 at 14:15 +0100, Theiss, Ingo wrote:
> > > > 
> > > > Here are the compiler flags used.
> > > > 
> > > > 32-bit:
> > > > 
> > > > CFLAGS: -O2 -Wall -g -m32 -march=amdfam10 -mtune=amdfam10
> > > > -fno-omit-frame-pointer -Wall -Wmissing-prototypes -std=c99
> > > > -ffast-math -fno-strict-aliasing -fno-builtin-memcmp -m32 -O2
> > > > -Wall -g -m32 -march=amdfam10 -mtune=amdfam10 -fno-omit
> > > > -frame-pointer -fPIC -m32
> > > 
> > > Have you tried adding -mfpmath=sse to the 32-bit CFLAGS? According
> > > to my
> > > gcc documentation, that option is enabled by default in 64-bit mode
> > > but
> > > disabled in 32-bit mode.
> > > 
> > > Anyway, I guess there's room for optimization in glReadPixels...
> > 
> > Ok I have added -mfpmath=sse to the 32-bit CFLAGS and the readback
> > performance increased from 30.44 Mpixels/sec to 48.92 Mpixel/sec. We
> > are getting closer to the 64-bit performance.
> 
> hmm. you should try -msse2 too. It's implied on 64bits, and I'm not
> sure if -march/-mfpmath=sse by itself will enable the intrinsics.

From my reading of the gcc docs, it's implied by -march=amdfam10 .

-- 
Earthling Michel Dänzer           |                   http://www.amd.com
Libre software enthusiast         |          Debian, X and DRI developer