[Mesa-dev] [PATCH 2/7] i965: Simplify gl_SampleID setup on Gen8+.

Matt Turner mattst88 at gmail.com
Tue Apr 19 23:52:44 UTC 2016


On Tue, Apr 19, 2016 at 4:51 PM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> On Tuesday, April 19, 2016 11:19:54 AM PDT Matt Turner wrote:
>> On Mon, Apr 18, 2016 at 11:48 PM, Kenneth Graunke <kenneth at whitecape.org>
> wrote:
>> > On Gen7+, the thread payload provides the sample ID - we can read it
>> > in two instructions, without any elaborate calculations.  We don't even
>> > need a state dependency - this will properly produce zero in the
>> > non-MSAA case.  Unfortunately, we need the state flag anyway, so we
>> > may as well continue to use it to produce a single MOV 0 instead of
>> > SHR/AND.
>> >
>> > For some reason, the sample ID field is always zero on Gen7/7.5, so
>> > we can't use this yet.  However, it works fine on Gen8+.  So, land the
>> > code and use it where it's working, and leave a TODO for later.
>> >
>> > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 42 ++++++++++++++++++++++++++++++
> +-----
>> >  1 file changed, 37 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/
> i965/brw_fs.cpp
>> > index 551294d..41af35e 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > @@ -1343,6 +1343,41 @@ fs_visitor::emit_sampleid_setup()
>> >         *  rasterization is disabled, gl_SampleID will always be zero."
>> >         */
>> >        abld.MOV(*reg, brw_imm_d(0));
>> > +   } else if (devinfo->gen >= 8) {
>> > +      /* Sample ID comes in as 4-bit numbers in g1.0:
>> > +       *
>> > +       *    15:12 Slot 3 SampleID (only used in SIMD16)
>> > +       *     11:8 Slot 2 SampleID (only used in SIMD16)
>> > +       *      7:4 Slot 1 SampleID
>> > +       *      3:0 Slot 0 SampleID
>> > +       *
>> > +       * Each slot corresponds to four channels, so we want to replicate
> each
>> > +       * half-byte value to 4 channels in a row:
>> > +       *
>> > +       *    dst+0:    .7    .6    .5    .4    .3    .2    .1    .0
>> > +       *             7:4   7:4   7:4   7:4   3:0   3:0   3:0   3:0
>> > +       *
>> > +       *    dst+1:    .7    .6    .5    .4    .3    .2    .1    .0  (if
> SIMD16)
>> > +       *           15:12 15:12 15:12 15:12  11:8  11:8  11:8  11:8
>> > +       *
>> > +       * First, we read g1.0 with a <1,8,0>UB region, causing the first 8
>> > +       * channels to read the first byte (7:0), and the second group of 8
>> > +       * channels to read the second byte (15:8).  Then, we shift right
> by
>> > +       * a vector immediate of <4, 4, 4, 4, 0, 0, 0, 0>, moving the slot
> 1 / 3
>> > +       * values into place.  Finally, we AND with 0xf to keep the low
> nibble.
>> > +       *
>> > +       *    shr(16) tmp<1>UW g1.0<1,8,0>UB 0x44440000:V
>> > +       *    and(16) dst<1>UD tmp<8,8,1>UD  0xf:UD
>>
>> The comment doesn't match the code (and as the comment is, it wouldn't
> work).
>>
>> - tmp is typed W below, but is UW and UD in the comment
>> - g1.0 is B below, but UB in the comment
>> - 0xf is W below, but UD in the comment
>
> You're right - I changed the code, and neglected to update the comment.
>
> I can update the comment to match the code.
>
>> > +       *
>> > +       * TODO: These payload bits exist on Gen7 too, but they appear to
> always
>> > +       *       be zero, so this code fails to work.  We should find out
> why.
>> > +       */
>> > +      fs_reg tmp(VGRF, alloc.allocate(1), BRW_REGISTER_TYPE_W);
>> > +
>> > +      abld.SHR(tmp, fs_reg(stride(retype(brw_vec1_grf(1, 0),
>> > +                                         BRW_REGISTER_TYPE_B), 1, 8, 0)),
>> > +                    brw_imm_v(0x44440000));
>> > +      abld.AND(*reg, tmp, brw_imm_w(0xf));
>>
>> ANDing with 0xf is only necessary if you're operating on signed types,
>> which you don't need to be doing. Change the types to unsigned, and
>> replace the AND with a MOV that expands the UW source to UD.
>
> I don't understand.  I don't think signedness should matter.  I'm using
> SHR, not ASR - so there's no sign extension.  After that, "tmp" should
> contain a bunch of 4-bit values.  Reading it as W, the top bit should
> not ever be set.
>
> The byte being read contains:
>  - 7:4 Slot 1 SampleID
>  - 3:0 Slot 0 SampleID
>
> Shifting right by 0x44440000 means that the first four channels contain
>
>  - 7:4 Zero
>  - 3:0 Slot 1 SampleID
>
> while the second group of four channels remain untouched:
>
>  - 7:4 Slot 1 SampleID
>  - 3:0 Slot 0 SampleID
>
> So for that group, I need to AND with 0xf (4 bits) to mask out the
> "Slot 1 SampleID" in bits 7:4, so I just get "Slot 0" as desired.

Oh, right. The shift-by-zero case requires ANDing with 0xf. Makes sense.


More information about the mesa-dev mailing list