[Mesa-dev] i965: Uniform loading via samplers with discard

Cody Northrop cody at lunarg.com
Thu Jun 12 12:35:36 PDT 2014


Commit 17c7ead7 exposed a bug in how uniform loading happens in the
presence of discard.  It manifested itself in an application as randomly
incorrect pixels on the borders of conditional areas.

We believe it is due to how discards jump to the end of the shader
incorrectly for some channels.  The current implementation checks each 2x2
subspan to preserve derivatives.  When uniform loading via samplers was
turned on, it uses a full execution mask, as stated in
lower_uniform_pull_constant_loads(), and only populates four channels of
the destination (see generate_uniform_pull_constant_load_gen7()).  We think
that is happening incorrectly when the first subspan has been jumped over.

A possible fix is to only jump to the end of the shader if all relevant
channels are disabled, i.e. all 8 or 16, depending on dispatch.  This
preserves the original speedup noted in commit beafced2.  There are other
more heavyweight fixes (i.e. don't use sampler for uniforms if discard in
use), but this seems like a good fix.  I've appended it below as a patch.
It changes the shader accordingly:

before    : 0x000000d0: (-f0.1.any4h) halt(8) 17 2
null                            { align1 WE_all 1Q };
after(8)  : 0x000000d0: (-f0.1.any8h) halt(8) 17 2
null                            { align1 WE_all 1Q };
after(16) : 0x00000250: (-f0.1.any16h) halt(16) 17 2
null                            { align1 WE_all 1H };

I attached a test case to the bugzilla entry below.

Thanks for any review or feedback.  Curious if anyone sees a better fix.

-C


>From 871671c4ab8ff00e85b434865e8855fc356efa8f Mon Sep 17 00:00:00 2001
From: Cody Northrop <cody at lunarg.com>
Date: Thu, 12 Jun 2014 09:07:18 -0600
Subject: [PATCH] i965/fs: Update discard jump to preserve uniform loads via
 sampler.

The series that implemented this optimization was done before
the changes to use samplers for uniform loads.  Uniform sampler
loads use special execution masks and only populate four
channels, so we can't jump over those or corruption ensues.
Use a more conservative jump mask which only jumps to the end
if all relevant channels are disabled.

No change was observed in GLbenchmark 2.7, so the optimization
is preserved.

Signed-off-by: Cody Northrop <cody at lunarg.com>
Reviewed-by: Mike Stroyan <mike at lunarg.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79948
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 8858852..fe05715 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1907,7 +1907,15 @@ fs_visitor::visit(ir_discard *ir)
        */
       fs_inst *discard_jump = emit(FS_OPCODE_DISCARD_JUMP);
       discard_jump->flag_subreg = 1;
-      discard_jump->predicate = BRW_PREDICATE_ALIGN1_ANY4H;
+
+      /* Uniforms are now loaded using samplers with a routine that has
+       * its own execution mask, so we can only jump if all relevant
+       * channels are dead.  This is more conservative than the previous
+       * four channel checking, but still preserves speedups.
+       */
+      discard_jump->predicate = (8 == dispatch_width)
+                                ? BRW_PREDICATE_ALIGN1_ANY8H
+                                : BRW_PREDICATE_ALIGN1_ANY16H;
       discard_jump->predicate_inverse = true;
    }
 }
-- 
1.8.3.2
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140612/679fa46d/attachment.html>


More information about the mesa-dev mailing list