[Mesa-dev] [PATCH 8/8] i965: Enable EXT_shader_samples_identical

Jason Ekstrand jason at jlekstrand.net
Wed Nov 18 21:29:17 PST 2015


On Nov 18, 2015 5:02 PM, "Jason Ekstrand" <jason at jlekstrand.net> wrote:
>
> On Wed, Nov 18, 2015 at 4:06 PM, Kenneth Graunke <kenneth at whitecape.org>
wrote:
> > On Wednesday, November 18, 2015 03:46:54 PM Ian Romanick wrote:
> >> From: Ian Romanick <ian.d.romanick at intel.com>
> >>
> >> Signed-off-by: Ian Romanick <ian.d.romanick at intel.com>
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp       |  1 +
> >>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 16 ++++++++++++++++
> >>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp     |  1 +
> >>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 11 +++++++++++
> >>  src/mesa/drivers/dri/i965/intel_extensions.c   |  1 +
> >>  5 files changed, 30 insertions(+)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >> index 1f71f66..4af1234 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >> @@ -2550,6 +2550,7 @@ fs_visitor::nir_emit_texture(const fs_builder
&bld, nir_tex_instr *instr)
> >>           switch (instr->op) {
> >>           case nir_texop_txf:
> >>           case nir_texop_txf_ms:
> >> +         case nir_texop_samples_identical:
> >>              coordinate = retype(src, BRW_REGISTER_TYPE_D);
> >>              break;
> >>           default:
> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> >> index a7bd9ce..6688f6a 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> >> @@ -259,6 +259,22 @@ fs_visitor::emit_texture(ir_texture_opcode op,
> >>        lod = fs_reg(0u);
> >>     }
> >>
> >> +   if (op == ir_samples_identical) {
> >> +      fs_reg dst = vgrf(glsl_type::get_instance(dest_type->base_type,
1, 1));
> >> +
> >> +      if (mcs.file == BRW_IMMEDIATE_VALUE) {
> >> +         fs_reg tmp = vgrf(glsl_type::uint_type);
> >> +
> >> +         bld.MOV(tmp, mcs);
> >> +         bld.CMP(dst, tmp, src_reg(0u), BRW_CONDITIONAL_EQ);
> >
> > Seems a little strange to emit assembly to do the comparison when
> > you've already determined that the value is a compile time constant.
> >
> > Why not just:
> >
> >    bld.MOV(dst, fs_reg(mcs.ud == 0u ? ~0u : 0u));
>
> Actually, getting an immediate here means we don't have an MCS and we
> have no idea of the samples are identical, so we should return false
> always.
>
> >> +      } else {
> >> +         bld.CMP(dst, mcs, src_reg(0u), BRW_CONDITIONAL_EQ);
>
> We should also consider handling the clear color case.  In this case,
> we'll get 0xff for 2x and 0xffffffff for 4x or 8x.  Do we know the
> number of samples in the shader?  We should be able to get that from
> the sampler or something but then we would have to pass that through
> the key and that would get gross.

First off, I realized that the numbers I have there are wrong. It's 0xff
for 2x and 4x and 0xffffffff for 8x.  However, I also just realized that
the 8 8-bit values you get for 8x MSAA range from 0 to 7 but take up 4 bits
each.  This means that no valid 8x MSAA MCS value can have 0xff as its
bottom 8 bits unless it's the clear color.  This means that a simple and
with 0xff will get us a clear color check on all but 16x.  Unfortunately,
16x has a 64-bit MCS value and, unless the hardware provides is with some
extra guarantees, 0xff would be valid in the bottom 8 bits.

Going off into the world of speculation just a bit, can we make some
assumptions about the hardware?  Suppose for a moment that the used a
fairly greedy algorithm for determining which plane to store a value in:

1) If all samples are affected, store in slice zero
2) If not, store in the first available empty or completely overwritten
slice.

Such an algorithm would make sense and have the nice property of tending to
pack the samples in the earlier slices this decreasing the possibility of
ever touching slice 15.  This is good for cache locality.  It also has
another property that would be very useful for us, namely that it only
touches slice 15 if all 16 samples have different colors.  In particular,
it would mean that you can never have two samples that both lie in slice 15
and, more specifically, 0xff would also be invalid for 16x.

Unfortunately, that's entirely based on my speculation as to how the
hardware works.  Given that we don't actually know, it's not documented,
and that we're not liable to ever find anyone willing to give us those
kinds of details, we're not likely to find out without a very clever
experiment.

OK, enough hardware speculation for one night...
--Jason

> One other thought, 16x MSAA will break all this because it gives you a
> ivec4 value from the MCS (if I remember correctly).  Not sure if we've
> landed 16x MSAA yet though.
> --Jason
>
> >> +      }
> >> +
> >> +      this->result = dst;
> >> +      return;
> >> +   }
> >> +
> >>     if (coordinate.file != BAD_FILE) {
> >>        /* FINISHME: Texture coordinate rescaling doesn't work with
non-constant
> >>         * samplers.  This should only be a problem with GL_CLAMP on
Gen7.
> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> >> index 3c2674d..41c3c10 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> >> @@ -1615,6 +1615,7 @@ vec4_visitor::nir_emit_texture(nir_tex_instr
*instr)
> >>           switch (instr->op) {
> >>           case nir_texop_txf:
> >>           case nir_texop_txf_ms:
> >> +         case nir_texop_samples_identical:
> >>              coordinate = get_nir_src(instr->src[i].src,
BRW_REGISTER_TYPE_D,
> >>                                       src_size);
> >>              coord_type = glsl_type::ivec(src_size);
> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> >> index fda3d7c..2190a86 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> >> @@ -909,6 +909,17 @@ vec4_visitor::emit_texture(ir_texture_opcode op,
> >>        unreachable("TXB is not valid for vertex shaders.");
> >>     case ir_lod:
> >>        unreachable("LOD is not valid for vertex shaders.");
> >> +   case ir_samples_identical: {
> >> +      if (mcs.file == BRW_IMMEDIATE_VALUE) {
> >> +         const src_reg temp = src_reg(this, glsl_type::uint_type);
> >> +
> >> +         emit(MOV(dst_reg(temp), mcs));
> >> +         emit(CMP(dest, temp, src_reg(0u), BRW_CONDITIONAL_EQ));
> >
> > Ditto.
> >
> >    bld.MOV(dst, src_reg(mcs.ud == 0u ? ~0u : 0u));
> >
> >> +      } else {
> >> +         emit(CMP(dest, mcs, src_reg(0u), BRW_CONDITIONAL_EQ));
> >> +      }
> >> +      return;
> >> +   }
> >>     default:
> >>        unreachable("Unrecognized tex op");
> >>     }
> >> diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c
b/src/mesa/drivers/dri/i965/intel_extensions.c
> >> index 386b63c..2e2459c 100644
> >> --- a/src/mesa/drivers/dri/i965/intel_extensions.c
> >> +++ b/src/mesa/drivers/dri/i965/intel_extensions.c
> >> @@ -333,6 +333,7 @@ intelInitExtensions(struct gl_context *ctx)
> >>        ctx->Extensions.ARB_texture_compression_bptc = true;
> >>        ctx->Extensions.ARB_texture_view = true;
> >>        ctx->Extensions.ARB_shader_storage_buffer_object = true;
> >> +      ctx->Extensions.EXT_shader_samples_identical = true;
> >>
> >>        if (can_do_pipelined_register_writes(brw)) {
> >>           ctx->Extensions.ARB_draw_indirect = true;
> >>
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20151118/bf65f19b/attachment-0001.html>


More information about the mesa-dev mailing list