[Mesa-dev] [v2 5/8] i965/meta: Stencil blits

Pohjolainen, Topi topi.pohjolainen at intel.com
Thu May 15 01:22:21 PDT 2014


On Thu, May 15, 2014 at 12:57:42AM -0700, Kenneth Graunke wrote:
> On 05/12/2014 08:42 AM, Topi Pohjolainen wrote:
> > v2: Create the intel renderbuffer with level hardcoded to zero instead
> >     of overriding it in the surface state configuration. Also moved the
> >     dimension adjustments for tiling, mip level, msaa into the render
> >     buffer creation. Finally prepares for another blit path needed for
> >     miptree updownsampling.
> > 
> > Signed-off-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
> > ---
> >  src/mesa/drivers/dri/i965/Makefile.sources        |   1 +
> >  src/mesa/drivers/dri/i965/brw_context.h           |   6 +
> >  src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c | 492 ++++++++++++++++++++++
> >  3 files changed, 499 insertions(+)
> >  create mode 100644 src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c
> > 
> > diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources
> > index 9379fd2..5fc90b5 100644
> > --- a/src/mesa/drivers/dri/i965/Makefile.sources
> > +++ b/src/mesa/drivers/dri/i965/Makefile.sources
> > @@ -76,6 +76,7 @@ i965_FILES = \
> >  	brw_lower_texture_gradients.cpp \
> >  	brw_lower_unnormalized_offset.cpp \
> >  	brw_meta_updownsample.c \
> > +	brw_meta_stencil_blit.c \
> >  	brw_meta_util.c \
> >  	brw_misc_state.c \
> >  	brw_object_purgeable.c \
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
> > index 6d4a5b5..d6b7f16 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > @@ -1482,6 +1482,12 @@ GLuint brw_get_rb_for_slice(struct brw_context *brw,
> >  void brw_meta_updownsample(struct brw_context *brw,
> >                             struct intel_mipmap_tree *src,
> >                             struct intel_mipmap_tree *dst);
> > +
> > +void brw_meta_fbo_stencil_blit(struct brw_context *brw,
> > +                               GLfloat srcX0, GLfloat srcY0,
> > +                               GLfloat srcX1, GLfloat srcY1,
> > +                               GLfloat dstX0, GLfloat dstY0,
> > +                               GLfloat dstX1, GLfloat dstY1);
> >  /*======================================================================
> >   * brw_misc_state.c
> >   */
> > diff --git a/src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c b/src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c
> > new file mode 100644
> > index 0000000..c38987e
> > --- /dev/null
> > +++ b/src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c
> > @@ -0,0 +1,492 @@
> > +/*
> > + * Copyright © 2014 Intel Corporation
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a
> > + * copy of this software and associated documentation files (the "Software"),
> > + * to deal in the Software without restriction, including without limitation
> > + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> > + * and/or sell copies of the Software, and to permit persons to whom the
> > + * Software is furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice (including the next
> > + * paragraph) shall be included in all copies or substantial portions of the
> > + * Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> > + * IN THE SOFTWARE.
> > + */
> > +
> > +/**
> > + * @file brw_meta_stencil_blit.c
> > + *
> > + * Implements upsampling, downsampling and scaling of stencil miptrees. The
> > + * logic can be originally found in brw_blorp_blit.c.
> > + * Implementation creates a temporary draw framebuffer object and attaches the
> > + * destination stencil buffer attachment as color attachment. Source attachment
> > + * is in turn treated as a stencil texture and the glsl program used for the
> > + * blitting samples it using stencil-indexing.
> > + *
> > + * Unfortunately as the data port does not support interleaved msaa-surfaces
> > + * (stencil is always IMS), the glsl program needs to handle the writing of
> > + * individual samples manually. Surface is configured as if it were single
> > + * sampled (with adjusted dimensions) and the glsl program extracts the
> > + * sample indices from the input coordinates for correct texturing.
> > + *
> > + * Target surface is also configured as Y-tiled instead of W-tiled in order
> > + * to support generations 6-7. Later hardware supports W-tiled as render target
> > + * and the logic here could be simplified for those.
> > + */
> > +
> > +#include "brw_context.h"
> > +#include "intel_batchbuffer.h"
> > +#include "intel_fbo.h"
> > +
> > +#include "main/blit.h"
> > +#include "main/buffers.h"
> > +#include "main/fbobject.h"
> > +#include "main/uniforms.h"
> > +#include "main/texparam.h"
> > +#include "main/texobj.h"
> > +#include "main/viewport.h"
> > +#include "main/enable.h"
> > +#include "main/blend.h"
> > +#include "main/varray.h"
> > +#include "main/shaderapi.h"
> > +#include "glsl/ralloc.h"
> > +
> > +#include "drivers/common/meta.h"
> > +#include "brw_meta_util.h"
> > +
> > +#define FILE_DEBUG_FLAG DEBUG_FBO
> > +
> > +struct blit_dims {
> > +   int src_x0, src_y0, src_x1, src_y1;
> > +   int dst_x0, dst_y0, dst_x1, dst_y1;
> > +   bool mirror_x, mirror_y;
> > +};
> > +
> > +static const char *vs_source =
> > +   "#version 130\n"
> > +   "in vec2 position;\n"
> > +   "out vec2 tex_coords;\n"
> > +   "void main()\n"
> > +   "{\n"
> > +   "   tex_coords = (position + 1.0) / 2.0;\n"
> > +   "   gl_Position = vec4(position, 0.0, 1.0);\n"
> > +   "}\n";
> > +
> > +static const struct sampler_and_fetch {
> > +  const char *sampler;
> > +  const char *fetch;
> > +} samplers[] = {
> > +   { "uniform usampler2D texSampler;\n",
> > +     "   out_color = texelFetch(texSampler, txl_coords, 0)" },
> > +   { "#extension GL_ARB_texture_multisample : enable\n"
> > +     "uniform usampler2DMS texSampler;\n",
> > +     "   out_color = texelFetch(texSampler, txl_coords, sample_index)" }
> > +};
> > +
> > +/**
> > + * Translating Y-tiled to W-tiled:
> > + *
> > + *  X' = (X & ~0b1011) >> 1 | (Y & 0b1) << 2 | X & 0b1
> > + *  Y' = (Y & ~0b1) << 1 | (X & 0b1000) >> 2 | (X & 0b10) >> 1
> > + */
> > +static const char *fs_tmpl =
> > +   "#version 130\n"
> > +   "%s"
> > +   "uniform float src_x_scale;\n"
> > +   "uniform float src_y_scale;\n"
> > +   "uniform float src_x_off;\n" /* Top right coordinates of the source */
> > +   "uniform float src_y_off;\n" /* rectangle in W-tiled space. */
> > +   "uniform float dst_x_off;\n" /* Top right coordinates of the target */
> > +   "uniform float dst_y_off;\n" /* rectangle in Y-tiled space. */
> > +   "uniform float draw_rect_w;\n" /* This is the unnormalized size of the */
> > +   "uniform float draw_rect_h;\n" /* drawing rectangle in Y-tiled space. */
> > +   "uniform int dst_x0;\n" /* This is the bounding rectangle in the W-tiled */
> > +   "uniform int dst_x1;\n" /* space that will be used to skip pixels lying */
> > +   "uniform int dst_y0;\n" /* outside. In some cases the Y-tiled rectangle */
> > +   "uniform int dst_y1;\n" /* is larger. */
> > +   "uniform int dst_num_samples;\n"
> > +   "in vec2 tex_coords;\n"
> > +   "ivec2 txl_coords;\n"
> > +   "int sample_index;\n"
> > +   "out uvec4 out_color;\n"
> > +   "\n"
> > +   "void get_unorm_target_coords()\n"
> > +   "{\n"
> > +   "   txl_coords.x = int(tex_coords.x * draw_rect_w + dst_x_off);\n"
> > +   "   txl_coords.y = int(tex_coords.y * draw_rect_h + dst_y_off);\n"
> > +   "}\n"
> > +   "\n"
> > +   "void translate_dst_to_src()\n"
> > +   "{\n"
> > +   "   txl_coords.x = int(float(txl_coords.x) * src_x_scale + src_x_off);\n"
> > +   "   txl_coords.y = int(float(txl_coords.y) * src_y_scale + src_y_off);\n"
> > +   "}\n"
> > +   "\n"
> > +   "void translate_y_to_w_tiling()\n"
> > +   "{\n"
> > +   "   int X = txl_coords.x;\n"
> > +   "   int Y = txl_coords.y;\n"
> > +   "   txl_coords.x = (X & int(0xfff4)) >> 1;\n"
> > +   "   txl_coords.x |= ((Y & int(0x1)) << 2);\n"
> > +   "   txl_coords.x |= (X & int(0x1));\n"
> > +   "   txl_coords.y = (Y & int(0xfffe)) << 1;\n"
> > +   "   txl_coords.y |= ((X & int(0x8)) >> 2);\n"
> > +   "   txl_coords.y |= ((X & int(0x2)) >> 1);\n"
> > +   "}\n"
> > +   "\n"
> > +   "void decode_msaa()\n"
> > +   "{\n"
> > +   "   int X = txl_coords.x;\n"
> > +   "   int Y = txl_coords.y;\n"
> > +   "   switch (dst_num_samples) {\n"
> > +   "   case 0:\n"
> > +   "      sample_index = 0;\n"
> > +   "      break;\n"
> > +   "   case 2:\n"
> > +   "      txl_coords.x = ((X & int(0xfffc)) >> 1) | (X & int(0x1));\n"
> > +   "      sample_index = (X & 0x2) >> 1;\n"
> > +   "      break;\n"
> > +   "   case 4:\n"
> > +   "      txl_coords.x = ((X & int(0xfffc)) >> 1) | (X & int(0x1));\n"
> > +   "      txl_coords.y = ((Y & int(0xfffc)) >> 1) | (Y & int(0x1));\n"
> > +   "      sample_index = (Y & 0x2) | ((X & 0x2) >> 1);\n"
> > +   "      break;\n"
> > +   "   case 8:\n"
> > +   "      txl_coords.x = ((X & int(0xfff8)) >> 2) | (X & int(0x1));\n"
> > +   "      txl_coords.y = ((Y & int(0xfffc)) >> 1) | (Y & int(0x1));\n"
> > +   "      sample_index = (X & 0x4) | (Y & 0x2) | ((X & 0x2) >> 1);\n"
> > +   "   }\n"
> > +   "}\n"
> > +   "\n"
> > +   "void discard_outside_bounding_rect()\n"
> > +   "{\n"
> > +   "   int X = txl_coords.x;\n"
> > +   "   int Y = txl_coords.y;\n"
> > +   "   if (X >= dst_x1 || X < dst_x0 || Y >= dst_y1 || Y < dst_y0)\n"
> > +   "      discard;\n"
> > +   "}\n"
> > +   "\n"
> > +   "void main()\n"
> > +   "{\n"
> > +   "   get_unorm_target_coords();\n"
> > +   "   translate_y_to_w_tiling();\n"
> > +   "   decode_msaa();"
> > +   "   discard_outside_bounding_rect();\n"
> > +   "   translate_dst_to_src();\n"
> > +   "   %s;\n"
> > +   "}\n";
> > +
> > +/**
> > + * Setup uniforms telling the coordinates of the destination rectangle in the
> > + * native w-tiled space. These are needed to ignore pixels that lie outside.
> > + * The destination is drawn as Y-tiled and in some cases the Y-tiled drawing
> > + * rectangle is larger than the original (for example 1x4 w-tiled requires
> > + * 16x2 y-tiled).
> > + */
> > +static void
> > +setup_bounding_rect(GLuint prog, const struct blit_dims *dims)
> > +{
> > +   _mesa_Uniform1i(_mesa_GetUniformLocation(prog, "dst_x0"), dims->dst_x0);
> > +   _mesa_Uniform1i(_mesa_GetUniformLocation(prog, "dst_x1"), dims->dst_x1);
> > +   _mesa_Uniform1i(_mesa_GetUniformLocation(prog, "dst_y0"), dims->dst_y0);
> > +   _mesa_Uniform1i(_mesa_GetUniformLocation(prog, "dst_y1"), dims->dst_y1);
> > +}
> > +
> > +/**
> > + * Setup uniforms telling the destination width, height and the offset. These
> > + * are needed to unnoormalize the input coordinates and to correctly translate
> > + * between destination and source that may have differing offsets.
> > + */
> > +static void
> > +setup_drawing_rect(GLuint prog, const struct blit_dims *dims)
> > +{
> > +   _mesa_Uniform1f(_mesa_GetUniformLocation(prog, "draw_rect_w"),
> > +                   dims->dst_x1 - dims->dst_x0);
> > +   _mesa_Uniform1f(_mesa_GetUniformLocation(prog, "draw_rect_h"),
> > +                   dims->dst_y1 - dims->dst_y0);
> > +   _mesa_Uniform1f(_mesa_GetUniformLocation(prog, "dst_x_off"), dims->dst_x0);
> > +   _mesa_Uniform1f(_mesa_GetUniformLocation(prog, "dst_y_off"), dims->dst_y0);
> > +}
> > +
> > +/**
> > + * When not mirroring a coordinate (say, X), we need:
> > + *   src_x - src_x0 = (dst_x - dst_x0 + 0.5) * scale
> > + * Therefore:
> > + *   src_x = src_x0 + (dst_x - dst_x0 + 0.5) * scale
> > + *
> > + * The program uses "round toward zero" to convert the transformed floating
> > + * point coordinates to integer coordinates, whereas the behaviour we actually
> > + * want is "round to nearest", so 0.5 provides the necessary correction.
> > + *
> > + * When mirroring X we need:
> > + *   src_x - src_x0 = dst_x1 - dst_x - 0.5
> > + * Therefore:
> > + *   src_x = src_x0 + (dst_x1 -dst_x - 0.5) * scale
> > + */
> > +static void
> > +setup_coord_coeff(GLuint prog, GLuint multiplier, GLuint offset,
> > +                  int src_0, int src_1, int dst_0, int dst_1, bool mirror)
> > +{
> > +   const float scale = ((float)(src_1 - src_0)) / (dst_1 - dst_0);
> > +
> > +   if (mirror) {
> > +      _mesa_Uniform1f(multiplier, -scale);
> > +      _mesa_Uniform1f(offset, src_0 + (dst_1 - 0.5) * scale);
> > +   } else {
> > +      _mesa_Uniform1f(multiplier, scale);
> > +      _mesa_Uniform1f(offset, src_0 + (-dst_0 + 0.5) * scale);
> > +   }
> > +}
> > +
> > +/**
> > + * Setup uniforms providing relation between source and destination surfaces.
> > + * Destination coordinates are in Y-tiling layout while texelFetch() expects
> > + * W-tiled coordinates. Once the destination coordinates are re-interpreted by
> > + * the program into the original W-tiled layout, the program needs to know the
> > + * offset and scaling factors between the destination and source.
> > + * Note that these are calculated in the original W-tiled space before the
> > + * destination rectangle is adjusted for possible msaa and Y-tiling.
> > + */
> > +static void
> > +setup_coord_transform(GLuint prog, const struct blit_dims *dims)
> > +{
> > +   setup_coord_coeff(prog,
> > +                     _mesa_GetUniformLocation(prog, "src_x_scale"),
> > +                     _mesa_GetUniformLocation(prog, "src_x_off"),
> > +                     dims->src_x0, dims->src_x1, dims->dst_x0, dims->dst_x1,
> > +                     dims->mirror_x);
> > +
> > +   setup_coord_coeff(prog,
> > +                     _mesa_GetUniformLocation(prog, "src_y_scale"),
> > +                     _mesa_GetUniformLocation(prog, "src_y_off"),
> > +                     dims->src_y0, dims->src_y1, dims->dst_y0, dims->dst_y1,
> > +                     dims->mirror_y);
> > +}
> > +
> > +static GLuint
> > +setup_program(struct gl_context *ctx, bool msaa_tex)
> > +{
> > +   struct blit_state *blit = &ctx->Meta->Blit;
> > +   static GLuint prog_cache[] = { 0, 0 };
> 
> I wonder about using static storage here, vs. storing it in a context.
> I feel like we need to store it in the context, but I can't concretely
> think of why this wouldn't work.  We can always fix this later if it's a
> problem.

Hmm, I think you have point here. While the resulted programs do not actually
depend on the state of the context yet (even thought the ctx->Meta->Blit is
referenced) that may change anytime. Also I don't exactly how programs are
scoped (can they be shared between different contexts). Hence even if it
creates possibly many copies of identical programs for multiple contexts I
would prefer it.
How about I follow it up with another patch? It will introduce a little more
dynamic and I would like to test it thoroughly before pushing.

> 
> > +   void *mem_ctx;
> > +   const char *fs_source;
> > +   const struct sampler_and_fetch *sampler = &samplers[msaa_tex];
> > +
> > +   _mesa_meta_setup_vertex_objects(&blit->VAO, &blit->VBO, true, 2, 2, 0);
> > +
> > +   if (prog_cache[msaa_tex]) {
> > +      _mesa_UseProgram(prog_cache[msaa_tex]);
> > +      return prog_cache[msaa_tex];
> > +   }
> > +  
> > +   mem_ctx = ralloc_context(NULL);
> > +   fs_source = ralloc_asprintf(mem_ctx, fs_tmpl, sampler->sampler,
> > +                               sampler->fetch);
> > +   _mesa_meta_compile_and_link_program(ctx, vs_source, fs_source,
> > +                                       "i965 stencil blit",
> > +                                       &prog_cache[msaa_tex]);
> > +   ralloc_free(mem_ctx);
> 
> You can actually skip the temporary mem_ctx here and just do:
> 
> fs_source = ralloc_asprintf(NULL, fs_tmpl, sampler->sampler,
> sampler->fetch);
> _mesa_meta_compile...
> ralloc_free(fs_source);

Oh, nice. I'll revise.

> 
> You're also welcome to use C99 mixed declarations and code in the i965
> directory, if you like.  (Either way is fine.)
> 
> I see no reason to hold up this landing, so for the series:
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
> 
> My review is pretty weak on the details, but it looks like you're doing
> the right sorts of things, in a pretty reasonable manner.  The real test
> is Piglit. :)  We can always fix more things or polish things in
> follow-up patches.
> 
> You should go ahead and push this.  Thanks, Topi!
> 


More information about the mesa-dev mailing list