[Mesa-dev] [PATCH] Reorder LLVM passes, running mem2reg earlier.

Mon May 3 09:37:56 PDT 2010

On Mon, 2010-05-03 at 09:20 -0700, Török Edwin wrote:
> On 2010-05-03 19:00, José Fonseca wrote:
> > Török,
> > 
> > Thanks.
> > 
> > I didn't see as much improvement (most of the stuff I've been playing
> > with has actually simple shaders), but I saw no regression so I've
> > commited it. We have more benchmarks running continuously from git so
> > once the commit goes through them we should have more data.
> 
> Thanks.
> 
> > 
> > Also, do you know any good piece of documentation describing the good
> > ordering of passes, or is it just trial/error?
> 
> Look in include/llvm/Support/StandardPasses.h for the default ordering
> of -O1, -O2, -O3

Great. Thanks.

> Mem2reg is good to be run first because it reduces number of
> (alloca+load+store), and creates a new LLVM (SSA) value for them.
> It also enables further transforms to be smarter (most can't see accross
> a load/store).
> 
> Now those orderings are for normal C programs, most of those
> optimizations may be of little benefit to shaders.
> Finding out which optimizers help shaders is a trial/error I think.
> 
> Unfortunately LLVM doesn't have any -ffast-math-like optimizers, I think
> no CSE/reassoc is done on FP operations by default because that can lead
> to wrong results (due to rounding).
> 
> Do the shaders need strict IEEE 754 math? 

I doubt. We already use -ffast-math in when compiling mesa.

-ffast-math tipically optimizes by leaving intermediate values in the
x86 FP stack which is wider, so it is non conformant due to the extra
precision. But we do almost everything with SIMD operations, so there is
little to be gained I think. Also on 64bit the default is to use the
partial SIMD instructions for scalars.

> Or can c+b+a be rewritten as
> a+b+c for example?

I don't know about OpenCL/Cuda/D3D Compute, but I'm positive that cannot
matter for GL/D3D. I know people are successfully doing GPGPU on graphic
hardware, but the graphics APIs aren't that precise. And even when they
do, drivers/hardware often cuts some corners.

Jose