[Mesa-dev] [PATCH] i965/fs: Implement SIMD16 dual source blending.

Iago Toral itoral at igalia.com
Mon Mar 9 00:13:39 PDT 2015


On Sat, 2015-03-07 at 13:35 -0800, Jason Ekstrand wrote:
> 
> 
> On Thu, Mar 5, 2015 at 9:39 PM, Jason Ekstrand <jason at jlekstrand.net>
> wrote:
>         This looks fine to me.  I just kicked off a build on our test
>         farm and, assuming that looks good (I'll send another e-mail
>         in the morning if it does),
>         
>         
>         Reviewed-by: Jason Ekstrand <jason.ekstrand at intel.com>
>         
> 
> 
> Jenkins results look god so feel free to apply the R-B above and push
> it.
> 
> Don't worry about the shader-db number given that, as ken pointed out,
> shader-db is kind of useless for this.  I wish we knew how many SIMD16
> programs this gave us in practice, but short of doing lots of
> shader-db work, we can't know at the moment so don't worry about it.

Ok, I will push now.
Thanks for the review and the testing!

Iago

>         
>         I ran shader-db on the change and I was kind of surprised to
>         see that it doesn't really do anything.
>         
>         GAINED: shaders/dolphin/smg.1.shader_test FS SIMD16
>         
>         total instructions in shared programs: 5769629 -> 5769629
>         (0.00%)
>         instructions in affected programs:     0 -> 0
>         helped:                                0
>         HURT:                                  0
>         GAINED:                                1
>         LOST:                                  0
>         
>         
>         Perhaps shader-db doesn't account for some other GL state
>         required for dual-source because I doubt only one shader uses
>         it.  Ken?
>         
>         
>         
>         --Jason
>         
>         
>         On Thu, Mar 5, 2015 at 3:21 AM, Iago Toral Quiroga
>         <itoral at igalia.com> wrote:
>                 From the SNB PRM, volume 4, part 1, page 193:
>                 
>                 "The dual source render target messages only have
>                 SIMD8 forms due to
>                  maximum message length limitations. SIMD16 pixel
>                 shaders must send two of
>                  these messages to cover all of the pixels. Each
>                 message contains two colors
>                  (4 channels each) for each pixel in the message
>                 payload."
>                 
>                 Bugzilla:
>                 https://bugs.freedesktop.org/show_bug.cgi?id=82831
>                 ---
>                 I sent this patch for review some months ago, but it
>                 was bad timing because
>                 it was when Jason was doing a large rewrite of the
>                 visitor code handling
>                 FB writes, so the patch became immediately obsolete.
>                 This is the up-to-date
>                 version.
>                 
>                 If anyone wants to test this, I sent this patch to
>                 piglit with a test that
>                 can be used to check for correct SIMD16 implementation
>                 specifically:
>                 http://lists.freedesktop.org/archives/piglit/2015-March/015015.html
>                 
>                  src/mesa/drivers/dri/i965/brw_eu.h             |  1 +
>                  src/mesa/drivers/dri/i965/brw_eu_emit.c        |  3
>                 +-
>                  src/mesa/drivers/dri/i965/brw_fs.h             |  6
>                 +-
>                  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 15
>                 ++++-
>                  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 77
>                 +++++++++++++++++++++-----
>                  5 files changed, 83 insertions(+), 19 deletions(-)
>                 
>                 diff --git a/src/mesa/drivers/dri/i965/brw_eu.h
>                 b/src/mesa/drivers/dri/i965/brw_eu.h
>                 index 736c54b..d9ad5bd 100644
>                 --- a/src/mesa/drivers/dri/i965/brw_eu.h
>                 +++ b/src/mesa/drivers/dri/i965/brw_eu.h
>                 @@ -266,6 +266,7 @@ void brw_fb_WRITE(struct
>                 brw_compile *p,
>                                    unsigned msg_length,
>                                    unsigned response_length,
>                                    bool eot,
>                 +                  bool last_render_target,
>                                    bool header_present);
>                 
>                  void brw_SAMPLE(struct brw_compile *p,
>                 diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c
>                 b/src/mesa/drivers/dri/i965/brw_eu_emit.c
>                 index 1d6fd67..74cf138 100644
>                 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
>                 +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
>                 @@ -2292,6 +2292,7 @@ void brw_fb_WRITE(struct
>                 brw_compile *p,
>                                    unsigned msg_length,
>                                    unsigned response_length,
>                                    bool eot,
>                 +                  bool last_render_target,
>                                    bool header_present)
>                  {
>                     struct brw_context *brw = p->brw;
>                 @@ -2333,7 +2334,7 @@ void brw_fb_WRITE(struct
>                 brw_compile *p,
>                                             msg_type,
>                                             msg_length,
>                                             header_present,
>                 -                           eot, /* last render target
>                 write */
>                 +                           last_render_target,
>                                             response_length,
>                                             eot,
>                                             0 /* send_commit_msg */);
>                 diff --git a/src/mesa/drivers/dri/i965/brw_fs.h
>                 b/src/mesa/drivers/dri/i965/brw_fs.h
>                 index 70098d8..5a4f66c 100644
>                 --- a/src/mesa/drivers/dri/i965/brw_fs.h
>                 +++ b/src/mesa/drivers/dri/i965/brw_fs.h
>                 @@ -369,10 +369,12 @@ public:
>                     bool optimize_frontfacing_ternary(nir_alu_instr
>                 *instr,
>                                                       const fs_reg
>                 &result);
>                 
>                 -   int setup_color_payload(fs_reg *dst, fs_reg color,
>                 unsigned components);
>                 +   int setup_color_payload(fs_reg *dst, fs_reg color,
>                 unsigned components,
>                 +                           bool use_2nd_half);
>                     void emit_alpha_test();
>                     fs_inst *emit_single_fb_write(fs_reg color1,
>                 fs_reg color2,
>                 -                                 fs_reg src0_alpha,
>                 unsigned components);
>                 +                                 fs_reg src0_alpha,
>                 unsigned components,
>                 +                                 bool use_2nd_half =
>                 false);
>                     void emit_fb_writes();
>                     void emit_urb_writes();
>                 
>                 diff --git
>                 a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>                 b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>                 index cbe6191..db70df5 100644
>                 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>                 +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>                 @@ -217,9 +217,12 @@
>                 fs_generator::fire_fb_write(fs_inst *inst,
>                 
>                     if (inst->opcode == FS_OPCODE_REP_FB_WRITE)
>                        msg_control =
>                 BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD16_SINGLE_SOURCE_REPLICATED;
>                 -   else if (prog_data->dual_src_blend)
>                 -      msg_control =
>                 BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD8_DUAL_SOURCE_SUBSPAN01;
>                 -   else if (dispatch_width == 16)
>                 +   else if (prog_data->dual_src_blend) {
>                 +      if (dispatch_width == 8 || !inst->eot)
>                 +         msg_control =
>                 BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD8_DUAL_SOURCE_SUBSPAN01;
>                 +      else
>                 +         msg_control =
>                 BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD8_DUAL_SOURCE_SUBSPAN23;
>                 +   } else if (dispatch_width == 16)
>                        msg_control =
>                 BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD16_SINGLE_SOURCE;
>                     else
>                        msg_control =
>                 BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD8_SINGLE_SOURCE_SUBSPAN01;
>                 @@ -227,6 +230,10 @@
>                 fs_generator::fire_fb_write(fs_inst *inst,
>                     uint32_t surf_index =
>                        prog_data->binding_table.render_target_start +
>                 inst->target;
>                 
>                 +   bool last_render_target = inst->eot ||
>                 +
>                  (prog_data->dual_src_blend && dispatch_width == 16);
>                 +
>                 +
>                     brw_fb_WRITE(p,
>                                  dispatch_width,
>                                  payload,
>                 @@ -236,6 +243,7 @@
>                 fs_generator::fire_fb_write(fs_inst *inst,
>                                  nr,
>                                  0,
>                                  inst->eot,
>                 +                last_render_target,
>                                  inst->header_present);
>                 
>                     brw_mark_surface_used(&prog_data->base,
>                 surf_index);
>                 @@ -373,6 +381,7 @@
>                 fs_generator::generate_blorp_fb_write(fs_inst *inst)
>                                  inst->mlen,
>                                  0,
>                                  true,
>                 +                true,
>                                  inst->header_present);
>                  }
>                 
>                 diff --git
>                 a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>                 b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>                 index 57c4d66..d3b8345 100644
>                 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>                 +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>                 @@ -3366,7 +3366,8 @@
>                 fs_visitor::emit_interpolation_setup_gen6()
>                  }
>                 
>                  int
>                 -fs_visitor::setup_color_payload(fs_reg *dst, fs_reg
>                 color, unsigned components)
>                 +fs_visitor::setup_color_payload(fs_reg *dst, fs_reg
>                 color, unsigned components,
>                 +                                bool use_2nd_half)
>                  {
>                     brw_wm_prog_key *key = (brw_wm_prog_key*)
>                 this->key;
>                     fs_inst *inst;
>                 @@ -3384,7 +3385,7 @@
>                 fs_visitor::setup_color_payload(fs_reg *dst, fs_reg
>                 color, unsigned components)
>                        colors_enabled = (1 << components) - 1;
>                     }
>                 
>                 -   if (dispatch_width == 8 || brw->gen >= 6) {
>                 +   if (dispatch_width == 8 || (brw->gen >= 6 && !
>                 do_dual_src)) {
>                        /* SIMD8 write looks like:
>                         * m + 0: r0
>                         * m + 1: r1
>                 @@ -3415,6 +3416,33 @@
>                 fs_visitor::setup_color_payload(fs_reg *dst, fs_reg
>                 color, unsigned components)
>                           len++;
>                        }
>                        return len;
>                 +   } else if (brw->gen >= 6 && do_dual_src) {
>                 +      /* SIMD16 dual source blending for gen6+.
>                 +       *
>                 +       * From the SNB PRM, volume 4, part 1, page
>                 193:
>                 +       *
>                 +       * "The dual source render target messages only
>                 have SIMD8 forms due to
>                 +       *  maximum message length limitations. SIMD16
>                 pixel shaders must send two
>                 +       *  of these messages to cover all of the
>                 pixels. Each message contains
>                 +       *  two colors (4 channels each) for each pixel
>                 in the message payload."
>                 +       *
>                 +       * So in SIMD16 dual source blending we will
>                 send 2 SIMD8 messages,
>                 +       * each one will call this function twice (one
>                 for each color involved),
>                 +       * so in each pass we only write 4 registers.
>                 Notice that the second
>                 +       * SIMD8 message needs to read color data from
>                 the 2nd half of the color
>                 +       * registers, so it needs to call this with
>                 use_2nd_half = true.
>                 +       */
>                 +      for (unsigned i = 0; i < 4; ++i) {
>                 +         if (colors_enabled & (1 << i)) {
>                 +            dst[i] = fs_reg(GRF, alloc.allocate(1),
>                 color.type);
>                 +            inst = emit(MOV(dst[i],
>                 half(offset(color, i),
>                 +
>                  use_2nd_half ? 1 : 0)));
>                 +            inst->saturate =
>                 key->clamp_fragment_color;
>                 +            if (use_2nd_half)
>                 +               inst->force_sechalf = true;
>                 +         }
>                 +      }
>                 +      return 4;
>                     } else {
>                        /* pre-gen6 SIMD16 single source DP write looks
>                 like:
>                         * m + 0: r0
>                 @@ -3498,7 +3526,8 @@ fs_visitor::emit_alpha_test()
>                 
>                  fs_inst *
>                  fs_visitor::emit_single_fb_write(fs_reg color0,
>                 fs_reg color1,
>                 -                                 fs_reg src0_alpha,
>                 unsigned components)
>                 +                                 fs_reg src0_alpha,
>                 unsigned components,
>                 +                                 bool use_2nd_half)
>                  {
>                     assert(stage == MESA_SHADER_FRAGMENT);
>                     brw_wm_prog_data *prog_data = (brw_wm_prog_data*)
>                 this->prog_data;
>                 @@ -3558,7 +3587,8 @@
>                 fs_visitor::emit_single_fb_write(fs_reg color0, fs_reg
>                 color1,
>                         * alpha out the pipeline to our null
>                 renderbuffer to support
>                         * alpha-testing, alpha-to-coverage, and so on.
>                         */
>                 -      length += setup_color_payload(sources + length,
>                 this->outputs[0], 0);
>                 +      length += setup_color_payload(sources + length,
>                 this->outputs[0], 0,
>                 +                                    false);
>                     } else if (color1.file == BAD_FILE) {
>                        if (src0_alpha.file != BAD_FILE) {
>                           sources[length] = fs_reg(GRF,
>                 alloc.allocate(reg_size),
>                 @@ -3568,10 +3598,13 @@
>                 fs_visitor::emit_single_fb_write(fs_reg color0, fs_reg
>                 color1,
>                           length++;
>                        }
>                 
>                 -      length += setup_color_payload(sources + length,
>                 color0, components);
>                 +      length += setup_color_payload(sources + length,
>                 color0, components,
>                 +                                    false);
>                     } else {
>                 -      length += setup_color_payload(sources + length,
>                 color0, components);
>                 -      length += setup_color_payload(sources + length,
>                 color1, components);
>                 +      length += setup_color_payload(sources + length,
>                 color0, components,
>                 +                                    use_2nd_half);
>                 +      length += setup_color_payload(sources + length,
>                 color1, components,
>                 +                                    use_2nd_half);
>                     }
>                 
>                     if (source_depth_to_render_target) {
>                 @@ -3640,12 +3673,6 @@ fs_visitor::emit_fb_writes()
>                     brw_wm_prog_data *prog_data = (brw_wm_prog_data*)
>                 this->prog_data;
>                     brw_wm_prog_key *key = (brw_wm_prog_key*)
>                 this->key;
>                 
>                 -   if (do_dual_src) {
>                 -      no16("GL_ARB_blend_func_extended not yet
>                 supported in SIMD16.");
>                 -      if (dispatch_width == 16)
>                 -         do_dual_src = false;
>                 -   }
>                 -
>                     fs_inst *inst;
>                     if (do_dual_src) {
>                        if (INTEL_DEBUG & DEBUG_SHADER_TIME)
>                 @@ -3656,6 +3683,30 @@ fs_visitor::emit_fb_writes()
>                        inst = emit_single_fb_write(this->outputs[0],
>                 this->dual_src_output,
>                                                    reg_undef, 4);
>                        inst->target = 0;
>                 +
>                 +      /* SIMD16 dual source blending requires to send
>                 two SIMD8 dual source
>                 +       * messages, where each message contains color
>                 data for 8 pixels. Color
>                 +       * data for the first group of pixels is stored
>                 in the "lower" half of
>                 +       * the color registers, so in SIMD16, the
>                 previous message did:
>                 +       * m + 0: r0
>                 +       * m + 1: g0
>                 +       * m + 2: b0
>                 +       * m + 3: a0
>                 +       *
>                 +       * Here goes the second message, which packs
>                 color data for the
>                 +       * remaining 8 pixels. Color data for these
>                 pixels is stored in the
>                 +       * "upper" half of the color registers, so we
>                 need to do:
>                 +       * m + 0: r1
>                 +       * m + 1: g1
>                 +       * m + 2: b1
>                 +       * m + 3: a1
>                 +       */
>                 +      if (dispatch_width == 16) {
>                 +         inst =
>                 emit_single_fb_write(this->outputs[0],
>                 this->dual_src_output,
>                 +                                     reg_undef, 4,
>                 true);
>                 +         inst->target = 0;
>                 +      }
>                 +
>                        prog_data->dual_src_blend = true;
>                     } else if (key->nr_color_regions > 0) {
>                        for (int target = 0; target <
>                 key->nr_color_regions; target++) {
>                 --
>                 1.9.1
>                 
>                 _______________________________________________
>                 mesa-dev mailing list
>                 mesa-dev at lists.freedesktop.org
>                 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>         
>         
> 
> 




More information about the mesa-dev mailing list