[Mesa-dev] [PATCH] Better GPU program optimization in Mesa front end

Mon Jul 26 17:14:46 PDT 2010

So basically,

"mov dst.xz  src.yw" is impossible with mesa code?

Ben

________________________________________
From: mesa-dev-bounces+benjamin.segovia=intel.com at lists.freedesktop.org [mesa-dev-bounces+benjamin.segovia=intel.com at lists.freedesktop.org] On Behalf Of Segovia, Benjamin [benjamin.segovia at intel.com]
Sent: Monday, July 26, 2010 6:13 PM
To: Eric Anholt; mesa-dev at lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH] Better GPU program optimization in Mesa front end

- Hmm, first bug is clearly something I misunderstood. You say:

in "mov dst.xz src" src will use .xz and not .xy? But the swizzle is still. xyzw or do I miss a point?

- I am not sure but the other test were skipped I think with my version of piglit + mesa. I have to check that again.

- For bit_scan, bit_cound, what would you prefer?

Cheers,
Ben

________________________________________
From: Eric Anholt [eric at anholt.net]
Sent: Monday, July 26, 2010 6:03 PM
To: Segovia, Benjamin; mesa-dev at lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH] Better GPU program optimization in Mesa front end

On Mon, 19 Jul 2010 16:01:02 -0700, Benjamin Segovia <benjamin.segovia at intel.com> wrote:
> - Improved optimization of GPU programs. Now, swizzling is taken into account
>   and swizzles are properly transformed while removing mov instructions.
>   Removals of mov instructions are now much more effective
>
> - Analysis of control flows is still very primitive and far more too
>   conservative. Shaders using a lot of branches will be less optimized than
>   straightforward ones
>
> - Main things to do next is:
>    * instruction merging like for example merging:
>        mul a.x b.x c.x
>        mul a.y b.y c.y
>        mul a.z b.z c.z
>      into
>        mul a.xyz b.xyz c.xyz
>    * register renaming to avoid some still unecessary movs
>
> - Tested with piglit. I run all the shaders and compare output from the new
>   version with the old one. Also, run openarena, nexuiz and warsow. All games
>   perfectly run and GPU code is clearly improved. Note that I only use my Intel
>   Gen GPU for the backend. So everything was tested using classic Mesa with
>   the Intel i965 driver.

Added two new testcases to piglit, glsl-fs-add-masked and
glsl-fs-mov-masked, to catch bugs in this that I found while reviewing
the code after my GLSL demo started failing.  Patch included for
glsl-fs-add-masked (MOV dst.xz, src uses the X and Z channels of src,
not X and Y), feel free to squash it into a fixed patch.

glsl-vs-arrays, glsl-vs-arrays-2, and glsl-vs-mov-after-deref also
started failing for me as well after applying your change.  Lots of
important code seems to have been optimized out.  I'm guessing RelAddr
handling is broken.  Everywhere else that bit_count is used looks bogus
to me, but I haven't constructed tests for them yet.

I'm not a fan of bit_clear[], bit_scan[], or expand_one[] either.  They
obfuscate the code to me.