[Mesa-dev] [PATCH 8/8] i965: Enable EXT_shader_samples_identical

Jason Ekstrand jason at jlekstrand.net
Thu Nov 19 13:50:54 PST 2015


On Wed, Nov 18, 2015 at 9:29 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>
> On Nov 18, 2015 5:02 PM, "Jason Ekstrand" <jason at jlekstrand.net> wrote:
>>
>> On Wed, Nov 18, 2015 at 4:06 PM, Kenneth Graunke <kenneth at whitecape.org>
>> wrote:
>> > On Wednesday, November 18, 2015 03:46:54 PM Ian Romanick wrote:
>> >> From: Ian Romanick <ian.d.romanick at intel.com>
>> >>
>> >> Signed-off-by: Ian Romanick <ian.d.romanick at intel.com>
>> >> ---
>> >>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp       |  1 +
>> >>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 16 ++++++++++++++++
>> >>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp     |  1 +
>> >>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 11 +++++++++++
>> >>  src/mesa/drivers/dri/i965/intel_extensions.c   |  1 +
>> >>  5 files changed, 30 insertions(+)
>> >>
>> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >> index 1f71f66..4af1234 100644
>> >> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >> @@ -2550,6 +2550,7 @@ fs_visitor::nir_emit_texture(const fs_builder
>> >> &bld, nir_tex_instr *instr)
>> >>           switch (instr->op) {
>> >>           case nir_texop_txf:
>> >>           case nir_texop_txf_ms:
>> >> +         case nir_texop_samples_identical:
>> >>              coordinate = retype(src, BRW_REGISTER_TYPE_D);
>> >>              break;
>> >>           default:
>> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> >> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> >> index a7bd9ce..6688f6a 100644
>> >> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> >> @@ -259,6 +259,22 @@ fs_visitor::emit_texture(ir_texture_opcode op,
>> >>        lod = fs_reg(0u);
>> >>     }
>> >>
>> >> +   if (op == ir_samples_identical) {
>> >> +      fs_reg dst = vgrf(glsl_type::get_instance(dest_type->base_type,
>> >> 1, 1));
>> >> +
>> >> +      if (mcs.file == BRW_IMMEDIATE_VALUE) {
>> >> +         fs_reg tmp = vgrf(glsl_type::uint_type);
>> >> +
>> >> +         bld.MOV(tmp, mcs);
>> >> +         bld.CMP(dst, tmp, src_reg(0u), BRW_CONDITIONAL_EQ);
>> >
>> > Seems a little strange to emit assembly to do the comparison when
>> > you've already determined that the value is a compile time constant.
>> >
>> > Why not just:
>> >
>> >    bld.MOV(dst, fs_reg(mcs.ud == 0u ? ~0u : 0u));
>>
>> Actually, getting an immediate here means we don't have an MCS and we
>> have no idea of the samples are identical, so we should return false
>> always.
>>
>> >> +      } else {
>> >> +         bld.CMP(dst, mcs, src_reg(0u), BRW_CONDITIONAL_EQ);
>>
>> We should also consider handling the clear color case.  In this case,
>> we'll get 0xff for 2x and 0xffffffff for 4x or 8x.  Do we know the
>> number of samples in the shader?  We should be able to get that from
>> the sampler or something but then we would have to pass that through
>> the key and that would get gross.
>
> First off, I realized that the numbers I have there are wrong. It's 0xff for
> 2x and 4x and 0xffffffff for 8x.  However, I also just realized that the 8
> 8-bit values you get for 8x MSAA range from 0 to 7 but take up 4 bits each.
> This means that no valid 8x MSAA MCS value can have 0xff as its bottom 8
> bits unless it's the clear color.  This means that a simple and with 0xff
> will get us a clear color check on all but 16x.  Unfortunately, 16x has a
> 64-bit MCS value and, unless the hardware provides is with some extra
> guarantees, 0xff would be valid in the bottom 8 bits.
>
> Going off into the world of speculation just a bit, can we make some
> assumptions about the hardware?  Suppose for a moment that the used a fairly
> greedy algorithm for determining which plane to store a value in:
>
> 1) If all samples are affected, store in slice zero
> 2) If not, store in the first available empty or completely overwritten
> slice.
>
> Such an algorithm would make sense and have the nice property of tending to
> pack the samples in the earlier slices this decreasing the possibility of
> ever touching slice 15.  This is good for cache locality.  It also has
> another property that would be very useful for us, namely that it only
> touches slice 15 if all 16 samples have different colors.  In particular, it
> would mean that you can never have two samples that both lie in slice 15
> and, more specifically, 0xff would also be invalid for 16x.
>
> Unfortunately, that's entirely based on my speculation as to how the
> hardware works.  Given that we don't actually know, it's not documented, and
> that we're not liable to ever find anyone willing to give us those kinds of
> details, we're not likely to find out without a very clever experiment.

So, chad has convinced my that my speculation is quite possibly bogus
and *very* hard to actually test, so just checking the bottom 8 bits
probably won't work.

Something else that came out of that conversation is that, for 2x
MSAA, we may get bogus data in all but the bottom 4 bits.  In other
words, just blindly checking for zero is probably a bad idea.  It'll
work because the extension spec lets us return false negatives, but it
isn't a good idea in general.  If we really want the implementation to
be solid, we need to mask off all but the bottom n * log2(n) bits
where n = number of samples.

> OK, enough hardware speculation for one night...
> --Jason
>
>> One other thought, 16x MSAA will break all this because it gives you a
>> ivec4 value from the MCS (if I remember correctly).  Not sure if we've
>> landed 16x MSAA yet though.
>> --Jason
>>
>> >> +      }
>> >> +
>> >> +      this->result = dst;
>> >> +      return;
>> >> +   }
>> >> +
>> >>     if (coordinate.file != BAD_FILE) {
>> >>        /* FINISHME: Texture coordinate rescaling doesn't work with
>> >> non-constant
>> >>         * samplers.  This should only be a problem with GL_CLAMP on
>> >> Gen7.
>> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> >> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> >> index 3c2674d..41c3c10 100644
>> >> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> >> @@ -1615,6 +1615,7 @@ vec4_visitor::nir_emit_texture(nir_tex_instr
>> >> *instr)
>> >>           switch (instr->op) {
>> >>           case nir_texop_txf:
>> >>           case nir_texop_txf_ms:
>> >> +         case nir_texop_samples_identical:
>> >>              coordinate = get_nir_src(instr->src[i].src,
>> >> BRW_REGISTER_TYPE_D,
>> >>                                       src_size);
>> >>              coord_type = glsl_type::ivec(src_size);
>> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
>> >> b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
>> >> index fda3d7c..2190a86 100644
>> >> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
>> >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
>> >> @@ -909,6 +909,17 @@ vec4_visitor::emit_texture(ir_texture_opcode op,
>> >>        unreachable("TXB is not valid for vertex shaders.");
>> >>     case ir_lod:
>> >>        unreachable("LOD is not valid for vertex shaders.");
>> >> +   case ir_samples_identical: {
>> >> +      if (mcs.file == BRW_IMMEDIATE_VALUE) {
>> >> +         const src_reg temp = src_reg(this, glsl_type::uint_type);
>> >> +
>> >> +         emit(MOV(dst_reg(temp), mcs));
>> >> +         emit(CMP(dest, temp, src_reg(0u), BRW_CONDITIONAL_EQ));
>> >
>> > Ditto.
>> >
>> >    bld.MOV(dst, src_reg(mcs.ud == 0u ? ~0u : 0u));
>> >
>> >> +      } else {
>> >> +         emit(CMP(dest, mcs, src_reg(0u), BRW_CONDITIONAL_EQ));
>> >> +      }
>> >> +      return;
>> >> +   }
>> >>     default:
>> >>        unreachable("Unrecognized tex op");
>> >>     }
>> >> diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c
>> >> b/src/mesa/drivers/dri/i965/intel_extensions.c
>> >> index 386b63c..2e2459c 100644
>> >> --- a/src/mesa/drivers/dri/i965/intel_extensions.c
>> >> +++ b/src/mesa/drivers/dri/i965/intel_extensions.c
>> >> @@ -333,6 +333,7 @@ intelInitExtensions(struct gl_context *ctx)
>> >>        ctx->Extensions.ARB_texture_compression_bptc = true;
>> >>        ctx->Extensions.ARB_texture_view = true;
>> >>        ctx->Extensions.ARB_shader_storage_buffer_object = true;
>> >> +      ctx->Extensions.EXT_shader_samples_identical = true;
>> >>
>> >>        if (can_do_pipelined_register_writes(brw)) {
>> >>           ctx->Extensions.ARB_draw_indirect = true;
>> >>
>> >
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev at lists.freedesktop.org
>> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> >


More information about the mesa-dev mailing list