[Mesa-dev] [PATCH] i965: Compute dFdy() correctly for FBOs.
Paul Berry
stereotype441 at gmail.com
Thu Jun 21 06:44:15 PDT 2012
On 20 June 2012 23:35, Kenneth Graunke <kenneth at whitecape.org> wrote:
> On 06/20/2012 04:08 PM, Paul Berry wrote:
> > On i965, dFdx() and dFdy() are computed by taking advantage of the
> > fact that each consecutive set of 4 pixels dispatched to the fragment
> > shader always constitutes a contiguous 2x2 block of pixels in a fixed
> > arrangement known as a "sub-span". So we calculate dFdx() by taking
> > the difference between the values computed for the left and right
> > halves of the sub-span, and we calculate dFdy() by taking the
> > difference between the values computed for the top and bottom halves
> > of the sub-span.
> >
> > However, there's a subtlety when FBOs are in use: since FBOs use a
> > coordinate system where the origin is at the upper left, and window
> > system framebuffers use a coordinate system where the origin is at the
> > lower left, the computation of dFdy() needs to be negated for FBOs.
>
> Nice catch.
>
Thanks. I was fortunate to find it--it cropped up while writing piglit
tests for MSAA centroid behaviour, since those tests run in FBOs and
involve checking that derivatives work properly.
>
> > This patch modifies the fragment shader back-ends to negate the value
> > of dFdy() when an FBO is in use. It also modifies the code that
> > populates the program key (brw_wm_populate_key() and
> > brw_fs_precompile()) so that they always record in the program key
> > whether we are rendering to an FBO or to a window system framebuffer;
> > this ensures that the fragment shader will get recompiled when
> > switching between FBO and non-FBO use.
> >
> > This will result in unnecessary recompiles of fragment shaders that
> > don't use dFdy(). To fix that, we will need to adapt the GLSL and
> > NV_fragment_program front-ends to record whether or not a given shader
> > uses dFdy(). I plan to implement this in a future patch series; I've
> > left FIXME comments in the code as a reminder.
>
> Yeah, it looks like that kind of infrastructure is becoming increasingly
> important.
>
> > Fixes Piglit test "fbo-deriv".
> > ---
> > src/mesa/drivers/dri/i965/brw_fs.cpp | 10 ++++++++++
> > src/mesa/drivers/dri/i965/brw_fs.h | 3 ++-
> > src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 14 +++++++++++---
> > src/mesa/drivers/dri/i965/brw_wm.c | 10 ++++++++++
> > src/mesa/drivers/dri/i965/brw_wm.h | 3 ++-
> > src/mesa/drivers/dri/i965/brw_wm_emit.c | 15 +++++++++++----
> > 6 files changed, 46 insertions(+), 9 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index 313e720..43efd68 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -1848,6 +1848,13 @@ brw_fs_precompile(struct gl_context *ctx, struct
> gl_shader_program *prog)
> > struct brw_context *brw = brw_context(ctx);
> > struct brw_wm_prog_key key;
> >
> > + /* As a temporary measure we assume that all programs use dFdy()
> (and hence
> > + * need to be compiled differently depending on whether we're
> rendering to
> > + * an FBO). FIXME: set this bool correctly based on the contents of
> the
> > + * program.
> > + */
> > + bool program_uses_dfdy = true;
> > +
>
> I would just leave this as false. Almost all fragment shaders that I've
> seen don't use dPdy, so arguably false is the common case (which is what
> precompile is arguably aiming for). Though, it doesn't matter
> particularly much---the state settings selected here are fairly
> arbitrary, and at the end of the day, the actual compilation process
> will do the right thing regardless.
>
> Also, note that the precompile isn't used by default (unless you set the
> driconf option shader_precompile = true), so you most likely won't hit
> this at all.
>
> If you'd rather leave it, that's fine. It's not a big deal either way.
>
Hmm, thanks for the explanation. I didn't previously understand the
purpose of precompiling. I have to admit I'm not thrilled about the code
duplication between brw_fs_precompile() and brw_wm_populate_key(). In my
experience, code duplication between an often-used and a rarely-used
function tends to lead to bugs, because when people modify the often-used
function they forget to modify the rarely-used function. (In fact it looks
like your recent patch "i965: Don't set brw_wm_prog_key::iz_lookup on
Gen6+." just modified brw_wm_populate_key()). I don't really have a good
suggestion about how to eliminate the duplication, though.
>
> > if (!prog->_LinkedShaders[MESA_SHADER_FRAGMENT])
> > return true;
> >
> > @@ -1892,6 +1899,9 @@ brw_fs_precompile(struct gl_context *ctx, struct
> gl_shader_program *prog)
> >
> > if (fp->Base.InputsRead & FRAG_BIT_WPOS) {
> > key.drawable_height = ctx->DrawBuffer->Height;
> > + }
> > +
> > + if ((fp->Base.InputsRead & FRAG_BIT_WPOS) || program_uses_dfdy) {
> > key.render_to_fbo = _mesa_is_user_fbo(ctx->DrawBuffer);
> > }
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h
> b/src/mesa/drivers/dri/i965/brw_fs.h
> > index 0c6c3e4..2c2cb6c 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.h
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> > @@ -534,7 +534,8 @@ public:
> > struct brw_reg src);
> > void generate_discard(fs_inst *inst);
> > void generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg
> src);
> > - void generate_ddy(fs_inst *inst, struct brw_reg dst, struct brw_reg
> src);
> > + void generate_ddy(fs_inst *inst, struct brw_reg dst, struct brw_reg
> src,
> > + bool negate_value);
> > void generate_spill(fs_inst *inst, struct brw_reg src);
> > void generate_unspill(fs_inst *inst, struct brw_reg dst);
> > void generate_pull_constant_load(fs_inst *inst, struct brw_reg dst);
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
> > index 522123c..0881ad7 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
> > @@ -441,8 +441,13 @@ fs_visitor::generate_ddx(fs_inst *inst, struct
> brw_reg dst, struct brw_reg src)
> > brw_ADD(p, dst, src0, negate(src1));
> > }
> >
> > +/* The negate_value boolean is used to negate the derivative
> computation for
> > + * FBOs, since they place the origin at the upper left instead of the
> lower
> > + * left.
> > + */
> > void
> > -fs_visitor::generate_ddy(fs_inst *inst, struct brw_reg dst, struct
> brw_reg src)
> > +fs_visitor::generate_ddy(fs_inst *inst, struct brw_reg dst, struct
> brw_reg src,
> > + bool negate_value)
> > {
> > struct brw_reg src0 = brw_reg(src.file, src.nr, 0,
> > BRW_REGISTER_TYPE_F,
> > @@ -456,7 +461,10 @@ fs_visitor::generate_ddy(fs_inst *inst, struct
> brw_reg dst, struct brw_reg src)
> > BRW_WIDTH_4,
> > BRW_HORIZONTAL_STRIDE_0,
> > BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
> > - brw_ADD(p, dst, src0, negate(src1));
> > + if (negate_value)
> > + brw_ADD(p, dst, src1, negate(src0));
> > + else
> > + brw_ADD(p, dst, src0, negate(src1));
> > }
> >
> > void
> > @@ -902,7 +910,7 @@ fs_visitor::generate_code()
> > generate_ddx(inst, dst, src[0]);
> > break;
> > case FS_OPCODE_DDY:
> > - generate_ddy(inst, dst, src[0]);
> > + generate_ddy(inst, dst, src[0], c->key.render_to_fbo);
> > break;
> >
> > case FS_OPCODE_SPILL:
> > diff --git a/src/mesa/drivers/dri/i965/brw_wm.c
> b/src/mesa/drivers/dri/i965/brw_wm.c
> > index e919e5a..300e3bb 100644
> > --- a/src/mesa/drivers/dri/i965/brw_wm.c
> > +++ b/src/mesa/drivers/dri/i965/brw_wm.c
> > @@ -420,6 +420,13 @@ static void brw_wm_populate_key( struct brw_context
> *brw,
> > GLuint line_aa;
> > GLuint i;
> >
> > + /* As a temporary measure we assume that all programs use dFdy()
> (and hence
> > + * need to be compiled differently depending on whether we're
> rendering to
> > + * an FBO). FIXME: set this bool correctly based on the contents of
> the
> > + * program.
> > + */
> > + bool program_uses_dfdy = true;
> > +
> > memset(key, 0, sizeof(*key));
> >
> > /* Build the index for table lookup
> > @@ -519,6 +526,9 @@ static void brw_wm_populate_key( struct brw_context
> *brw,
> > */
> > if (fp->program.Base.InputsRead & FRAG_BIT_WPOS) {
> > key->drawable_height = ctx->DrawBuffer->Height;
> > + }
> > +
> > + if ((fp->program.Base.InputsRead & FRAG_BIT_WPOS) ||
> program_uses_dfdy) {
> > key->render_to_fbo = _mesa_is_user_fbo(ctx->DrawBuffer);
> > }
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_wm.h
> b/src/mesa/drivers/dri/i965/brw_wm.h
> > index 8f1cb8c..2cde2a0 100644
> > --- a/src/mesa/drivers/dri/i965/brw_wm.h
> > +++ b/src/mesa/drivers/dri/i965/brw_wm.h
> > @@ -346,7 +346,8 @@ void emit_ddxy(struct brw_compile *p,
> > const struct brw_reg *dst,
> > GLuint mask,
> > bool is_ddx,
> > - const struct brw_reg *arg0);
> > + const struct brw_reg *arg0,
> > + bool negate_value);
> > void emit_delta_xy(struct brw_compile *p,
> > const struct brw_reg *dst,
> > GLuint mask,
> > diff --git a/src/mesa/drivers/dri/i965/brw_wm_emit.c
> b/src/mesa/drivers/dri/i965/brw_wm_emit.c
> > index e27ff35..1258efe 100644
> > --- a/src/mesa/drivers/dri/i965/brw_wm_emit.c
> > +++ b/src/mesa/drivers/dri/i965/brw_wm_emit.c
> > @@ -457,12 +457,16 @@ void emit_frontfacing(struct brw_compile *p,
> > * between each other. We could probably do it like ddx and swizzle
> the right
> > * order later, but bail for now and just produce
> > * ((ss0.tl - ss0.bl)x4 (ss1.tl - ss1.bl)x4)
> > + *
> > + * The negate_value boolean is used to negate the d/dy computation for
> FBOs,
> > + * since they place the origin at the upper left instead of the lower
> left.
> > */
> > void emit_ddxy(struct brw_compile *p,
> > const struct brw_reg *dst,
> > GLuint mask,
> > bool is_ddx,
> > - const struct brw_reg *arg0)
> > + const struct brw_reg *arg0,
> > + bool negate_value)
> > {
> > int i;
> > struct brw_reg src0, src1;
> > @@ -498,7 +502,10 @@ void emit_ddxy(struct brw_compile *p,
> > BRW_HORIZONTAL_STRIDE_0,
> > BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
> > }
> > - brw_ADD(p, dst[i], src0, negate(src1));
> > + if (negate_value)
> > + brw_ADD(p, dst[i], src1, negate(src0));
> > + else
> > + brw_ADD(p, dst[i], src0, negate(src1));
> > }
> > }
> > if (mask & SATURATE)
> > @@ -1746,11 +1753,11 @@ void brw_wm_emit( struct brw_wm_compile *c )
> > break;
> >
> > case OPCODE_DDX:
> > - emit_ddxy(p, dst, dst_flags, true, args[0]);
> > + emit_ddxy(p, dst, dst_flags, true, args[0], false);
> > break;
> >
> > case OPCODE_DDY:
> > - emit_ddxy(p, dst, dst_flags, false, args[0]);
> > + emit_ddxy(p, dst, dst_flags, false, args[0],
> c->key.render_to_fbo);
> > break;
> >
> > case OPCODE_DP2:
> >
>
> This seems like a sensible solution to the problem at hand.
>
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
>
> It would be nice to not have to set this in the key universally, though,
> as this could have a pretty big impact on actual applications (twice the
> compiles, blowing out the program cache, etc.). Hate to pull out the
> big hammer for a minor bug. Of course, without measuring the impact on
> applications, I'm just wildly guessing :) Perhaps I can check a few
> apps tomorrow.
>
I will be curious to hear what you find--I suspect that it's rare for apps
to re-use the same fragment shader on both FBOs and the main window (and
when they do, I suspect they only do this for small shaders that are
inexpensive to recompile). But of course I'm guessing too :). If you do
find a slowdown, that will be further motivation to get cracking on the
follow-up patch, so that we only do the recompile if the shader actually
uses dFdy().
>
> I wonder, actually, whether compiling two (four) entirely separate
> copies of the shader is really the best solution to this problem. It
> might be best to simply add an implicit "uniform bool render_to_fbo" to
> any shaders that use dFdy() and emit two conditional ADDs, one for the
> FBO case and one for non-FBO. This would add pretty minimal overhead to
> the shader, but be a lot nicer in terms of compile time and program
> cache usage.
> Of course, that's probably harder, so I'm not suggesting it in the short
> term...more an idea to think about down the road.
>
Yeah, I've been thinking about this too. It seems like a good case to
pilot test this idea with would be key.drawable_height, since that one
could in pathological cases lead to an awful lot of recompiles.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20120621/0f6de938/attachment-0001.html>
More information about the mesa-dev
mailing list