[Mesa-dev] [PATCH] Better GPU program optimization in Mesa front end

Eric Anholt eric at anholt.net
Mon Jul 26 17:03:12 PDT 2010


On Mon, 19 Jul 2010 16:01:02 -0700, Benjamin Segovia <benjamin.segovia at intel.com> wrote:
> - Improved optimization of GPU programs. Now, swizzling is taken into account
>   and swizzles are properly transformed while removing mov instructions.
>   Removals of mov instructions are now much more effective
> 
> - Analysis of control flows is still very primitive and far more too
>   conservative. Shaders using a lot of branches will be less optimized than
>   straightforward ones
> 
> - Main things to do next is:
>    * instruction merging like for example merging:
>        mul a.x b.x c.x
>        mul a.y b.y c.y
>        mul a.z b.z c.z
>      into
>        mul a.xyz b.xyz c.xyz
>    * register renaming to avoid some still unecessary movs
> 
> - Tested with piglit. I run all the shaders and compare output from the new
>   version with the old one. Also, run openarena, nexuiz and warsow. All games
>   perfectly run and GPU code is clearly improved. Note that I only use my Intel
>   Gen GPU for the backend. So everything was tested using classic Mesa with
>   the Intel i965 driver.

Added two new testcases to piglit, glsl-fs-add-masked and
glsl-fs-mov-masked, to catch bugs in this that I found while reviewing
the code after my GLSL demo started failing.  Patch included for
glsl-fs-add-masked (MOV dst.xz, src uses the X and Z channels of src,
not X and Y), feel free to squash it into a fixed patch.

glsl-vs-arrays, glsl-vs-arrays-2, and glsl-vs-mov-after-deref also
started failing for me as well after applying your change.  Lots of
important code seems to have been optimized out.  I'm guessing RelAddr
handling is broken.  Everywhere else that bit_count is used looks bogus
to me, but I haven't constructed tests for them yet.

I'm not a fan of bit_clear[], bit_scan[], or expand_one[] either.  They
obfuscate the code to me.

From 9c7fb7b5b6f299b19d27245f2b68853521499afa Mon Sep 17 00:00:00 2001
From: Eric Anholt <eric at anholt.net>
Date: Mon, 26 Jul 2010 16:20:11 -0700
Subject: [PATCH] mesa: Fix bug in handling of read channel masks get_src_arg_mask.

Fixes piglit glsl-fs-add-masked, and my GLSL demo.
---
 src/mesa/program/prog_optimize.c |   14 ++++----------
 1 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/src/mesa/program/prog_optimize.c b/src/mesa/program/prog_optimize.c
index 9119b10..c853330 100644
--- a/src/mesa/program/prog_optimize.c
+++ b/src/mesa/program/prog_optimize.c
@@ -51,12 +51,11 @@ static const int expand_one[5] = {0,1,3,7,15};
 static GLuint
 get_src_arg_mask(const struct prog_instruction *inst, int arg, int dst_mask)
 {
-   int read_mask, channel_mask, read_count;
+   int read_mask, channel_mask;
    int comp;
 
    /* Form the dst register, find the written channels */
    if (inst->CondUpdate) {
-      read_count = 4;
       channel_mask = WRITEMASK_XYZW;
    }
    else {
@@ -67,8 +66,7 @@ get_src_arg_mask(const struct prog_instruction *inst, int arg, int dst_mask)
       case OPCODE_MAD:
       case OPCODE_MUL:
       case OPCODE_SUB:
-         read_count = bit_count[inst->DstReg.WriteMask & dst_mask];
-         channel_mask = expand_one[read_count];
+         channel_mask = inst->DstReg.WriteMask & dst_mask;
          break;
       case OPCODE_RCP:
       case OPCODE_SIN:
@@ -76,20 +74,16 @@ get_src_arg_mask(const struct prog_instruction *inst, int arg, int dst_mask)
       case OPCODE_RSQ:
       case OPCODE_POW:
       case OPCODE_EX2:
-         read_count = 1;
          channel_mask = WRITEMASK_X;
          break;
       case OPCODE_DP2:
-         read_count = 2;
          channel_mask = WRITEMASK_XY;
          break;
       case OPCODE_DP3:
       case OPCODE_XPD:
-         read_count = 3;
          channel_mask = WRITEMASK_XYZ;
          break;
       default:
-         read_count = 4;
          channel_mask = WRITEMASK_XYZW;
          break;
       }
@@ -99,9 +93,9 @@ get_src_arg_mask(const struct prog_instruction *inst, int arg, int dst_mask)
     * components are actually read
     */
    read_mask = 0;
-   for (comp = 0; comp < read_count; ++comp) {
+   for (comp = 0; comp < 4; ++comp) {
       const int coord = GET_SWZ(inst->SrcReg[arg].Swizzle, comp);
-      if (coord <= SWIZZLE_W)
+      if (channel_mask & (1 << comp) && coord <= SWIZZLE_W)
          read_mask |= 1 << coord;
    }
 
-- 
1.7.1

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20100726/3c9369e1/attachment.pgp>


More information about the mesa-dev mailing list