[Mesa-dev] [PATCH V3 1/2] i965/blorp: Add bilinear filtering of samples for multisample scaled blits

Wed Jun 26 20:56:30 PDT 2013

On 26 June 2013 19:41, Anuj Phogat <anuj.phogat at gmail.com> wrote:

> On Tue, Jun 25, 2013 at 10:27 AM, Paul Berry <stereotype441 at gmail.com>
> wrote:
> >
> > On 19 June 2013 19:45, Anuj Phogat <anuj.phogat at gmail.com> wrote:
> >>
> >> Current implementation of ext_framebuffer_multisample_blit_scaled in
> >> i965/blorp uses nearest filtering for multisample scaled blits. Using
> >> nearest filtering produces blocky artifacts and negates the benefits
> >> of MSAA. That is the reason why extension was not enabled on i965.
> >>
> >> This patch implements the bilinear filtering of samples in blorp engine.
> >> Images generated with this patch are free from blocky artifacts and show
> >> big improvement in visual quality.
> >>
> >> Observed no piglit and gles3 regressions.
> >>
> >> V3:
> >> - Algorithm used for filtering assumes a rectangular grid of samples
> >>   roughly corresponding to sample locations.
> >> - Test the boundary conditions on the edges of texture.
> >>
> >> Signed-off-by: Anuj Phogat <anuj.phogat at gmail.com>
> >
> >
> > Thanks for all your effort on this, Anuj.  I think we have an algorithm
> that's going to get us good quality.  I have a number of minor comments
> below, but I think the basic approach is good.
> >
> >>
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_blorp.h        |  11 ++
> >>  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 257
> +++++++++++++++++++++++++--
> >>  2 files changed, 258 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h
> b/src/mesa/drivers/dri/i965/brw_blorp.h
> >> index ffc27cc..0a15b89 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_blorp.h
> >> +++ b/src/mesa/drivers/dri/i965/brw_blorp.h
> >> @@ -319,6 +319,17 @@ struct brw_blorp_blit_prog_key
> >>      * than one sample per pixel.
> >>      */
> >>     bool persample_msaa_dispatch;
> >> +
> >> +   /* True for scaled blitting. */
> >> +   bool blit_scaled;
> >> +
> >> +   /* Source rectangle dimensions. Used to test boundary conditions in
> shader
> >> +    * program.
> >> +    */
> >> +   float src_x0;
> >> +   float src_y0;
> >> +   float src_x1;
> >> +   float src_y1;
> >
> >
> > Two comments about this:
> >
> > (1) Rather than clamp to the boundary of the source rectangle, I have a
> minor preference for clamping to the boundary of the source image.  This is
> what I've observed in nVidia's blitter, and it's what's implemented in
> Mesa's meta path for scaled blits.  Also, it would have the benefit of
> simplifying some of the logic below, since the lower clamp will always be 0.
> >
> > (2) We can't put these numbers in the program key, since they're going
> to change frequently, and changing the program key causes the program to be
> recompiled.  They need to go in push constants, like we do for the
> destination rectangle coordinates and the multipliers and offsets.
> >
> >
> >>
> >>  };
> >>
> >>  class brw_blorp_blit_params : public brw_blorp_params
> >> diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> >> index 8694128..e99c9df 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> >> @@ -622,7 +622,8 @@ private:
> >>     void kill_if_outside_dst_rect();
> >>     void translate_dst_to_src();
> >>     void single_to_blend();
> >> -   void manual_blend(unsigned num_samples);
> >> +   void manual_blend_average(unsigned num_samples);
> >> +   void manual_blend_linear(unsigned num_samples);
> >
> >
> > Minor nit pick: can we call this function manual_blend_bilinear()?
>  "linear" is such a generic term that it's hard to guess what it does, but
> "bilinear" makes it obvious.
> >
> >>
> >>     void sample(struct brw_reg dst);
> >>     void texel_fetch(struct brw_reg dst);
> >>     void mcs_fetch();
> >> @@ -676,6 +677,16 @@ private:
> >>      */
> >>     struct brw_reg y_coords[2];
> >>
> >> +   /* X, Y coordinates of the pixel from which we need to fetch the
> specific
> >> +    *  sample. These are used for multisample scaled blitting.
> >> +    */
> >> +   struct brw_reg x_sample_coords;
> >> +   struct brw_reg y_sample_coords;
> >> +
> >> +   /* Store the values to use to interpolate in x and y directions */
> >> +   struct brw_reg x_lerp;
> >> +   struct brw_reg y_lerp;
> >> +
> >
> >
> > I had trouble guessing what the "lerp" variables are used for.  How
> about something like this instead?
> >
> > /* Fractional parts of the x and y coordinates, used as bilinear
> interpolation coefficients */
> > struct brw_reg x_frac;
> > struct brw_reg y_frac;
> >
> >>
> >>     /* Which element of x_coords and y_coords is currently in use.
> >>      */
> >>     int xy_coord_index;
> >> @@ -814,15 +825,17 @@ brw_blorp_blit_program::compile(struct
> brw_context *brw,
> >>      * that we want to texture from.  Exception: if we are blending,
> then S is
> >>      * irrelevant, because we are going to fetch all samples.
> >>      */
> >> -   if (key->blend) {
> >> +   if (key->blend && !key->blit_scaled) {
> >>        if (brw->intel.gen == 6) {
> >>           /* Gen6 hardware an automatically blend using the SAMPLE
> message */
> >>           single_to_blend();
> >>           sample(texture_data[0]);
> >>        } else {
> >>           /* Gen7+ hardware doesn't automaticaly blend. */
> >> -         manual_blend(key->src_samples);
> >> +         manual_blend_average(key->src_samples);
> >>        }
> >> +   } else if(key->blend && key->blit_scaled) {
> >> +      manual_blend_linear(key->src_samples);
> >>     } else {
> >>        /* We aren't blending, which means we just want to fetch a
> single sample
> >>         * from the source surface.  The address that we want to fetch
> from is
> >> @@ -913,6 +926,18 @@ brw_blorp_blit_program::alloc_regs()
> >>           = retype(brw_vec8_grf(reg, 0), BRW_REGISTER_TYPE_UD);
> >>        reg += 2;
> >>     }
> >> +
> >> +   if (key->blit_scaled && key->blend) {
> >> +      this->x_sample_coords = brw_vec8_grf(reg, 0);
> >> +      reg += 2;
> >> +      this->y_sample_coords = brw_vec8_grf(reg, 0);
> >> +      reg += 2;
> >> +      this->x_lerp = brw_vec8_grf(reg, 0);
> >> +      reg += 2;
> >> +      this->y_lerp = brw_vec8_grf(reg, 0);
> >> +      reg += 2;
> >> +   }
> >> +
> >>     this->xy_coord_index = 0;
> >>     this->sample_index
> >>        = retype(brw_vec8_grf(reg, 0), BRW_REGISTER_TYPE_UD);
> >> @@ -1368,11 +1393,82 @@ brw_blorp_blit_program::translate_dst_to_src()
> >>     brw_MUL(&func, Y_f, Yp_f, y_transform.multiplier);
> >>     brw_ADD(&func, X_f, X_f, x_transform.offset);
> >>     brw_ADD(&func, Y_f, Y_f, y_transform.offset);
> >> -   /* Round the float coordinates down to nearest integer by moving to
> >> -    * UD registers.
> >> -    */
> >> -   brw_MOV(&func, Xp, X_f);
> >> -   brw_MOV(&func, Yp, Y_f);
> >> +   if (key->blit_scaled && key->blend) {
> >> +      float x_scale = 2.0;
> >> +      float y_scale = key->src_samples / 2.0;
> >
> >
> > It would be nice to have a comment above x_scale and y_scale explaining
> that they are the scale factor between the pixel grid and the grid of
> samples that we're texturing from.
> >
> > Also, I'd recommend moving them to someplace more global (either the
> class definition or the program key) so that they can be accessed from
> manual_blend_linear().
> >
> >>
> >> +      /* Translate coordinates to lay out the samples in a rectangular
>  grid
> >> +       * roughly corresponding to sample locations.
> >>
> >> +       */
> >> +      brw_ADD(&func, X_f, X_f, brw_imm_f(-0.25));
> >> +      brw_ADD(&func, Y_f, Y_f, brw_imm_f(-1.0 / key->src_samples));
> >> +      brw_MUL(&func, X_f, X_f, brw_imm_f(x_scale));
> >> +      brw_MUL(&func, Y_f, Y_f, brw_imm_f(y_scale));
> >
> >
> > The additive constants -0.25 and -1.0/key->src_samples are a bit
> difficult to understand.  I believe this is equivalent, and would be easier
> to follow:
> >
> > /* Translate coordinates to lay out the samples in a rectangular grid
> >  * roughly corresponding to sample locations.
> >  */
> > brw_MUL(&func, X_f, X_f, brw_imm_f(x_scale));
> > brw_MUL(&func, Y_f, Y_f, brw_imm_f(y_scale));
> >
> > /* Adjust coordinates so that integers represent pixel centers rather
> >  * than pixel edges.
> >  */
> > brw_ADD(&func, X_f, X_f, brw_imm_f(-0.5));
> > brw_ADD(&func, Y_f, Y_f, brw_imm_f(-0.5));
> >
> >>
> >> +
> >> +      /* Test boundary conditions to properly handle the sampling of
> texels on
> >> +       * texture edges.
> >> +       */
> >> +      brw_CMP(&func, vec16(brw_null_reg()), BRW_CONDITIONAL_L,
> >> +              X_f, brw_imm_f(2.0  * key->src_x0));
> >> +      brw_IF(&func, BRW_EXECUTE_16);
> >> +      {
> >> +         /* Left Edge in X */
> >> +         brw_MOV(&func, x_lerp, brw_imm_f(1.0));
> >> +      }
> >> +      brw_ELSE(&func);
> >> +      {
> >> +         brw_CMP(&func, vec16(brw_null_reg()), BRW_CONDITIONAL_GE,
> >> +                 X_f, brw_imm_f(2.0 * (key->src_x1 - 0.50)));
> >> +         brw_IF(&func, BRW_EXECUTE_16);
> >> +         {
> >> +            /* Right Edge in X */
> >> +            brw_MOV(&func, x_lerp, brw_imm_f(0.0));
> >> +         }
> >> +         brw_ELSE(&func);
> >> +         {
> >> +            /* Store the fractional part to be used as color
> interpolator */
> >> +            brw_FRC(&func, x_lerp, X_f);
> >> +         }
> >> +         brw_ENDIF(&func);
> >> +      }
> >> +      brw_ENDIF(&func);
> >>
> >> +
> >> +      brw_CMP(&func, vec16(brw_null_reg()), BRW_CONDITIONAL_L,
> >> +              Y_f, brw_imm_f((y_scale) * key->src_y0));
> >> +      brw_IF(&func, BRW_EXECUTE_16);
> >> +      {
> >> +         /* Bottom Edge in Y */
> >> +         brw_MOV(&func, y_lerp, brw_imm_f(1.0));
> >> +      }
> >> +      brw_ELSE(&func);
> >> +      {
> >> +         brw_CMP(&func, vec16(brw_null_reg()), BRW_CONDITIONAL_GE, Y_f,
> >> +                 brw_imm_f((y_scale) * (key->src_y1 - 1 / y_scale)));
> >> +         brw_IF(&func, BRW_EXECUTE_16);
> >> +         {
> >> +            /* Top Edge in Y */
> >> +            brw_MOV(&func, y_lerp, brw_imm_f(0.0));
> >> +         }
> >> +         brw_ELSE(&func);
> >> +         {
> >> +            /* Store the fractional part to be used as color
> interpolator */
> >> +            brw_FRC(&func, y_lerp, Y_f);
> >> +         }
> >> +         brw_ENDIF(&func);
> >> +      }
> >> +      brw_ENDIF(&func);
> >
> >
> > A few thoughts on the above conditional logic:
> >
> > (1) I think the math would be easier to follow if the following
> expressions were changed to equivalent forms:
> >
> > "2.0 * key->src_x0" => "x_scale * key->src_x0"
> > "2.0 * (key->src_x1 - 0.50)" => "x_scale * key->src_x1 - 1.0"
> > "(y_scale) * (key->src_y1 - 1 / y_scale)" => "y_scale * key->src_y1 -
> 1.0"
> >
> > This will also help in the future when we want to support hardware with
> 16x MSAA, because in that case we'll want x_scale = y_scale = 4.
> >
> > (2) Effectively what this code is doing is clamping the x and y
> coordinates to the boundary of the source rectangle.  But since it's
> accomplishing the clamping by adjusting x_lerp to 0.0 or 1.0, can only
> clamp things that aren't too far outside the source rectangle to begin with
> (i.e. no more than 1.0 outside the boundary).  I *think* that will work,
> but it seems like an unnecessary risk (and an unnecessary source of
> confusion).  Why not just clamp X_f to the range [x_scale * key->src_x0,
> x_scale * key->src_x1 - 1.0] (and similar for Y_f), and then afterward
> compute x_lerp = FRC(X_f)?  That way there will be no doubt that the
> coordinate is properly in range.
> >
> > (3) You can probably save a lot of instructions if you use conditional
> moves to do the clamping, e.g.
> >
> > brw_CMP(&func, vec16(brw_null_reg()), BRW_CONDITIONAL_L, X_f,
> brw_imm_f(x_scale * key->src_x0));
> > brw_MOV(&func, X_f, brw_imm_f(x_scale * key->src_x0));
> > brw_set_predicate_control(&func, BRW_PREDICATE_NONE);
> > ...etc.
> >
> >>
> >> +
> >> +     /* Round the float coordinates down to nearest integer */
> >> +     brw_RNDD(&func, Xp_f, X_f);
> >> +     brw_RNDD(&func, Yp_f, Y_f);
> >> +     brw_MUL(&func, X_f, Xp_f, brw_imm_f(0.5));
> >> +     brw_MUL(&func, Y_f, Yp_f, brw_imm_f(1 / y_scale));
> >> +   } else {
> >> +      /* Round the float coordinates down to nearest integer by moving
> to
> >> +       * UD registers.
> >> +       */
> >> +      brw_MOV(&func, Xp, X_f);
> >> +      brw_MOV(&func, Yp, Y_f);
> >> +   }
> >>     SWAP_XY_AND_XPYP();
> >>     brw_set_compression_control(&func, BRW_COMPRESSION_NONE);
> >>  }
> >> @@ -1418,7 +1514,7 @@ inline int count_trailing_one_bits(unsigned value)
> >>
> >>
> >>  void
> >> -brw_blorp_blit_program::manual_blend(unsigned num_samples)
> >> +brw_blorp_blit_program::manual_blend_average(unsigned num_samples)
> >>  {
> >>     if (key->tex_layout == INTEL_MSAA_LAYOUT_CMS)
> >>        mcs_fetch();
> >> @@ -1523,6 +1619,135 @@ brw_blorp_blit_program::manual_blend(unsigned
> num_samples)
> >>        brw_ENDIF(&func);
> >>  }
> >>
> >> +void
> >> +brw_blorp_blit_program::manual_blend_linear(unsigned num_samples)
> >> +{
> >> +   if (key->tex_layout == INTEL_MSAA_LAYOUT_CMS)
> >> +      mcs_fetch();
> >
> >
> > This won't work.  The MCS value we fetch has to match up with the pixel
> that we're sampling from.  Since this function samples from different
> pixels in each iteration of the "for" loop below, the call to mcs_fetch()
> needs to go inside the loop, and it needs to happen after storing the
> coordinates in X and Y.
> >
> I think MCS value fetch will not be required anymore as we are anyway
> getting rid of optimization which compares mcs value to zero.
>

MCS fetch is still needed, since the MCS value needs to be passed into the
ld2dms message that is used to read the samples from the surface.

>
> >>
> >> +
> >> +   /* We do this computation by performing the following operations:
> >> +    *
> >> +    * In case of 4x, 8x MSAA:
> >> +    * - Compute the pixel coordinates and sample numbers (a, b, c, d)
> >> +    *   which are later used for interpolation
> >> +    * - linearly interpolate samples a and b in X
> >> +    * - linearly interpolate samples c and d in X
> >> +    * - linearly interpolate the results of last two operations in Y
> >> +    *
> >> +    *   result = lrp(lrp(a + b) + lrp(c + d))
> >> +    */
> >> +   struct brw_reg Xp_f = retype(Xp, BRW_REGISTER_TYPE_F);
> >> +   struct brw_reg Yp_f = retype(Yp, BRW_REGISTER_TYPE_F);
> >> +   struct brw_reg t1_f = retype(t1, BRW_REGISTER_TYPE_F);
> >> +   struct brw_reg t2_f = retype(t2, BRW_REGISTER_TYPE_F);
> >> +
> >> +   for (unsigned i = 0; i < 4; ++i) {
> >> +      assert(i < ARRAY_SIZE(texture_data));
> >> +      s_is_zero = false;
> >> +
> >> +      /* Compute pixel coordinates */
> >> +      brw_ADD(&func, vec16(x_sample_coords), Xp_f,
> >> +              brw_imm_f((float)(i & 0x1) * 0.5));
> >> +      brw_ADD(&func, vec16(y_sample_coords), Yp_f,
> >> +              brw_imm_f((float)(i & 0x2) * (1.0 / num_samples)));
> >> +      brw_MOV(&func, vec16(X), x_sample_coords);
> >> +      brw_MOV(&func, vec16(Y), y_sample_coords);
> >> +
> >> +      /* Compute sample index. In case of 4x MSAA, sample index is
> >> +       * equal to sample number.
> >> +       */
> >
> >
> > This comment should explain what you mean by "sample index" vs "sample
> number".
> >
> >>
> >> +      brw_FRC(&func, vec16(t1_f), x_sample_coords);
> >> +      brw_FRC(&func, vec16(t2_f), y_sample_coords);
> >> +      brw_MUL(&func, vec16(t1_f), t1_f, brw_imm_f(2.0));
> >> +      brw_MUL(&func, vec16(t2_f), t2_f, brw_imm_f(num_samples));
> >
> >
> > How about this instead?  It's a little more future proof and it
> clarifies that what we're doing is computing our position within the grid
> of samples corresponding to a single pixel:
> >
> > brw_MUL(&func, vec16(t1_f), t1_f, brw_imm_f(x_scale));
> > brw_MUL(&func, vec16(t2_f), t2_f, brw_imm_f(x_scale * y_scale));
> >
> >>
> >> +      brw_ADD(&func, vec16(t1_f), t1_f, t2_f);
> >> +      brw_MOV(&func, vec16(S), t1_f);
> >> +
> >> +      if (num_samples == 8) {
> >> +         /* Map the sample index to a sample number */
> >> +         brw_CMP(&func, vec16(brw_null_reg()), BRW_CONDITIONAL_L,
> >> +                 S, brw_imm_d(4));
> >> +         brw_IF(&func, BRW_EXECUTE_16);
> >> +         {
> >> +            brw_MOV(&func, vec16(t2), brw_imm_d(5));
> >> +            brw_CMP(&func, vec16(brw_null_reg()), BRW_CONDITIONAL_EQ,
> >> +                    S, brw_imm_d(1));
> >> +            brw_MOV(&func, vec16(t2), brw_imm_d(2));
> >> +            brw_set_predicate_control(&func, BRW_PREDICATE_NONE);
> >> +            brw_CMP(&func, vec16(brw_null_reg()), BRW_CONDITIONAL_EQ,
> >> +                    S, brw_imm_d(2));
> >> +            brw_MOV(&func, vec16(t2), brw_imm_d(4));
> >> +            brw_set_predicate_control(&func, BRW_PREDICATE_NONE);
> >> +            brw_CMP(&func, vec16(brw_null_reg()), BRW_CONDITIONAL_EQ,
> >> +                    S, brw_imm_d(3));
> >> +            brw_MOV(&func, vec16(t2), brw_imm_d(6));
> >> +            brw_set_predicate_control(&func, BRW_PREDICATE_NONE);
> >> +         }
> >> +         brw_ELSE(&func);
> >> +         {
> >> +            brw_MOV(&func, vec16(t2), brw_imm_d(0));
> >> +            brw_CMP(&func, vec16(brw_null_reg()), BRW_CONDITIONAL_EQ,
> >> +                    S, brw_imm_d(5));
> >> +            brw_MOV(&func, vec16(t2), brw_imm_d(3));
> >> +            brw_set_predicate_control(&func, BRW_PREDICATE_NONE);
> >> +            brw_CMP(&func, vec16(brw_null_reg()), BRW_CONDITIONAL_EQ,
> >> +                    S, brw_imm_d(6));
> >> +            brw_MOV(&func, vec16(t2), brw_imm_d(7));
> >> +            brw_set_predicate_control(&func, BRW_PREDICATE_NONE);
> >> +            brw_CMP(&func, vec16(brw_null_reg()), BRW_CONDITIONAL_EQ,
> >> +                    S, brw_imm_d(7));
> >> +            brw_MOV(&func, vec16(t2), brw_imm_d(1));
> >> +            brw_set_predicate_control(&func, BRW_PREDICATE_NONE);
> >> +         }
> >> +         brw_ENDIF(&func);
> >> +         brw_MOV(&func, vec16(S), t2);
> >
> >
> > This seems like a lot of work to accomplish what is effectively a lookup
> table.  If this winds up becoming a performance bottleneck, you might want
> to consider passing the table in via a push constant, and using indirect
> addressing to convert sample index to sample number.
> >
> >>
> >> +      }
> >> +      texel_fetch(texture_data[i]);
> >> +
> >> +      if (i == 0 && key->tex_layout == INTEL_MSAA_LAYOUT_CMS) {
> >> +         /* The Ivy Bridge PRM, Vol4 Part1 p27 (Multisample Control
> Surface)
> >> +          * suggests an optimization:
> >> +          *
> >> +          *     "A simple optimization with probable large return in
> >> +          *     performance is to compare the MCS value to zero
> (indicating
> >> +          *     all samples are on sample slice 0), and sample only
> from
> >> +          *     sample slice 0 using ld2dss if MCS is zero."
> >> +          *
> >> +          * Note that in the case where the MCS value is zero,
> sampling from
> >> +          * sample slice 0 using ld2dss and sampling from sample 0
> using
> >> +          * ld2dms are equivalent (since all samples are on sample
> slice 0).
> >> +          * Since we have already sampled from sample 0, all we need
> to do is
> >> +          * skip the remaining fetches and averaging if MCS is zero.
> >> +          */
> >> +         brw_CMP(&func, vec16(brw_null_reg()), BRW_CONDITIONAL_NZ,
> >> +                 mcs_data, brw_imm_ud(0));
> >> +         brw_IF(&func, BRW_EXECUTE_16);
> >> +      }
> >
> >
> > We can't do this optimization for scaled multisample blits, because it
> relies on the assumption that all the samples we're blending together
> belong to the same pixel.  I'd recommend just pulling this code out.
> >
> >>
> >> +   }
> >> +
> >> +#define SAMPLE(x, y) offset(texture_data[x], y)
> >> +   brw_set_access_mode(&func, BRW_ALIGN_16);
> >> +   for (int index = 3; index > 0; ) {
> >> +      for (int k = 0; k < 8; ++k)
> >> +         brw_LRP(&func,
> >> +                 vec8(SAMPLE(index - 1, k)),
> >> +                 offset(x_lerp, k & 1),
> >> +                 SAMPLE(index, k),
> >> +                 SAMPLE(index - 1, k));
> >> +      index -= 2;
> >> +   }
> >> +   for (int k = 0; k < 8; ++k)
> >> +      brw_LRP(&func,
> >> +              vec8(SAMPLE(0, k)),
> >> +              offset(y_lerp, k & 1),
> >> +              vec8(SAMPLE(2, k)),
> >> +              vec8(SAMPLE(0, k)));
> >> +   brw_set_access_mode(&func, BRW_ALIGN_1);
> >
> >
> > I'm confused why we need loops from 0 to 7 here.  It looks like you're
> trying to interpolate each component of the SIMD8 register separately.
>  That shouldn't be necessary.
> >
> >>
> >> +#undef SAMPLE
> >> +   if (key->tex_layout == INTEL_MSAA_LAYOUT_CMS)
> >> +      brw_ENDIF(&func);
> >> +}
> >> +
> >>  /**
> >>   * Emit code to look up a value in the texture using the SAMPLE
> message (which
> >>   * does blending of MSAA surfaces).
> >> @@ -1535,7 +1760,8 @@ brw_blorp_blit_program::sample(struct brw_reg dst)
> >>        SAMPLER_MESSAGE_ARG_V_FLOAT
> >>     };
> >>
> >> -   texture_lookup(dst, GEN5_SAMPLER_MESSAGE_SAMPLE, args,
> ARRAY_SIZE(args));
> >> +   texture_lookup(dst, GEN5_SAMPLER_MESSAGE_SAMPLE, args,
> >> +                  ARRAY_SIZE(args));
> >>  }
> >>
> >>  /**
> >> @@ -1872,6 +2098,17 @@
> brw_blorp_blit_params::brw_blorp_blit_params(struct brw_context *brw,
> >>        wm_prog_key.persample_msaa_dispatch = true;
> >>     }
> >>
> >> +   /* Scaled blitting or not. */
> >> +   wm_prog_key.blit_scaled =
> >> +      ((dst_x1 - dst_x0) == (src_x1 - src_x0) &&
> >> +       (dst_y1 - dst_y0) == (src_y1 - src_y0)) ? false : true;
> >> +
> >> +   /* Source rectangle dimensions */
> >> +   wm_prog_key.src_x0 = src_x0;
> >> +   wm_prog_key.src_y0 = src_y0;
> >> +   wm_prog_key.src_x1 = src_x1;
> >> +   wm_prog_key.src_y1 = src_y1;
> >> +
> >>     /* The render path must be configured to use the same number of
> samples as
> >>      * the destination buffer.
> >>      */
> >> --
> >> 1.8.1.4
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130626/4adc6a40/attachment-0001.html>