[Mesa-dev] i965: Uniform loading via samplers with discard
Cody Northrop
cody at lunarg.com
Tue Jun 24 13:09:05 PDT 2014
On Tue, Jun 24, 2014 at 10:23 AM, Kenneth Graunke <kenneth at whitecape.org>
wrote:
> On Tuesday, June 24, 2014 08:17:23 AM Ian Romanick wrote:
> > Send patches with git-send-email and no other method. Save this message
> > and apply it to origin/master using git-am to see why.
> >
> > Hopefully Matt or Ken can review this...
> >
> > On 06/12/2014 12:35 PM, Cody Northrop wrote:
> > > Commit 17c7ead7 exposed a bug in how uniform loading happens in the
> > > presence of discard. It manifested itself in an application as
> randomly
> > > incorrect pixels on the borders of conditional areas.
> > >
> > > We believe it is due to how discards jump to the end of the shader
> > > incorrectly for some channels. The current implementation checks each
> > > 2x2 subspan to preserve derivatives. When uniform loading via samplers
> > > was turned on, it uses a full execution mask, as stated in
> > > lower_uniform_pull_constant_loads(), and only populates four channels
> of
> > > the destination (see generate_uniform_pull_constant_load_gen7()). We
> > > think that is happening incorrectly when the first subspan has been
> > > jumped over.
> > >
> > > A possible fix is to only jump to the end of the shader if all relevant
> > > channels are disabled, i.e. all 8 or 16, depending on dispatch. This
> > > preserves the original speedup noted in commit beafced2. There are
> > > other more heavyweight fixes (i.e. don't use sampler for uniforms if
> > > discard in use), but this seems like a good fix. I've appended it
> below
> > > as a patch. It changes the shader accordingly:
> > >
> > > before : 0x000000d0: (-f0.1.any4h) halt(8) 17 2
> > > null { align1 WE_all 1Q };
> > > after(8) : 0x000000d0: (-f0.1.any8h) halt(8) 17 2
> > > null { align1 WE_all 1Q };
> > > after(16) : 0x00000250: (-f0.1.any16h) halt(16) 17 2
> > > null { align1 WE_all 1H };
> >
> > All of this information should go in the commit message.
> >
> > > I attached a test case to the bugzilla entry below.
> >
> > Was the test also sent to the piglit mailing list for inclusion in the
> > the test suite?
> >
> > > Thanks for any review or feedback. Curious if anyone sees a better
> fix.
> > >
> > > -C
> > >
> > >
> > > From 871671c4ab8ff00e85b434865e8855fc356efa8f Mon Sep 17 00:00:00 2001
> > > From: Cody Northrop <cody at lunarg.com <mailto:cody at lunarg.com>>
> > > Date: Thu, 12 Jun 2014 09:07:18 -0600
> > > Subject: [PATCH] i965/fs: Update discard jump to preserve uniform loads
> via
> > > sampler.
> > >
> > > The series that implemented this optimization was done before
> > > the changes to use samplers for uniform loads. Uniform sampler
> > > loads use special execution masks and only populate four
> > > channels, so we can't jump over those or corruption ensues.
> > > Use a more conservative jump mask which only jumps to the end
> > > if all relevant channels are disabled.
> > >
> > > No change was observed in GLbenchmark 2.7, so the optimization
> > > is preserved.
> > >
> > > Signed-off-by: Cody Northrop <cody at lunarg.com <mailto:cody at lunarg.com
> >>
> > > Reviewed-by: Mike Stroyan <mike at lunarg.com <mailto:mike at lunarg.com>>
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79948
> > > ---
> > > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 10 +++++++++-
> > > 1 file changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > > b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > > index 8858852..fe05715 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > > @@ -1907,7 +1907,15 @@ fs_visitor::visit(ir_discard *ir)
> > > */
> > > fs_inst *discard_jump = emit(FS_OPCODE_DISCARD_JUMP);
> > > discard_jump->flag_subreg = 1;
> > > - discard_jump->predicate = BRW_PREDICATE_ALIGN1_ANY4H;
> > > +
> > > + /* Uniforms are now loaded using samplers with a routine that
> has
> > > + * its own execution mask, so we can only jump if all relevant
> > > + * channels are dead. This is more conservative than the
> previous
> > > + * four channel checking, but still preserves speedups.
>
> This comment won't make much sense when reading the code, since "the
> previous
> four channel checking" no longer exists and can't be observed anywhere
> other
> than in git history. Personally, I'd just drop the comment.
>
>
Sure, I'll clean up the comments.
> As far as I can tell, this is just broken and should have always been
> ANY8H/ANY16H. Good catch, and thanks for fixing it!
>
> > > + */
> > > + discard_jump->predicate = (8 == dispatch_width)
>
> Please drop the Yoda-conditional and use (dispatch_width == 8).
>
Since dispatch_width is constant, it can go on the left. '8' was more
read-only to my eyes. I try to avoid common typo mistakes (= instead of
==).
>
> With those two changes, this would get:
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
>
> If you resend a v2 using git-send-email, I'll be happy to push it.
>
Thanks, I'll fix all of this and send again correctly.
>
> > > + ? BRW_PREDICATE_ALIGN1_ANY8H
> > > + : BRW_PREDICATE_ALIGN1_ANY16H;
> > > discard_jump->predicate_inverse = true;
> > > }
> > > }
>
--
Cody Northrop
Graphics Software Engineer
LunarG, Inc.- 3D Driver Innovations
Email: cody at lunarg.com
Website: http://www.lunarg.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140624/f257a19f/attachment.html>
More information about the mesa-dev
mailing list