[Mesa-dev] [PATCH 2/7] i965: Simplify gl_SampleID setup on Gen8+.

Kenneth Graunke kenneth at whitecape.org
Wed Apr 20 00:07:20 UTC 2016


On Tuesday, April 19, 2016 4:52:44 PM PDT Matt Turner wrote:
> On Tue, Apr 19, 2016 at 4:51 PM, Kenneth Graunke <kenneth at whitecape.org> 
wrote:
> > On Tuesday, April 19, 2016 11:19:54 AM PDT Matt Turner wrote:
> >> On Mon, Apr 18, 2016 at 11:48 PM, Kenneth Graunke <kenneth at whitecape.org>
> > wrote:
> >> > On Gen7+, the thread payload provides the sample ID - we can read it
> >> > in two instructions, without any elaborate calculations.  We don't even
> >> > need a state dependency - this will properly produce zero in the
> >> > non-MSAA case.  Unfortunately, we need the state flag anyway, so we
> >> > may as well continue to use it to produce a single MOV 0 instead of
> >> > SHR/AND.
> >> >
> >> > For some reason, the sample ID field is always zero on Gen7/7.5, so
> >> > we can't use this yet.  However, it works fine on Gen8+.  So, land the
> >> > code and use it where it's working, and leave a TODO for later.
> >> >
> >> > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> >> > ---
> >> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 42 ++++++++++++++++++++++++++++
++
> > +-----
> >> >  1 file changed, 37 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/
dri/
> > i965/brw_fs.cpp
> >> > index 551294d..41af35e 100644
> >> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> > @@ -1343,6 +1343,41 @@ fs_visitor::emit_sampleid_setup()
> >> >         *  rasterization is disabled, gl_SampleID will always be zero."
> >> >         */
> >> >        abld.MOV(*reg, brw_imm_d(0));
> >> > +   } else if (devinfo->gen >= 8) {
> >> > +      /* Sample ID comes in as 4-bit numbers in g1.0:
> >> > +       *
> >> > +       *    15:12 Slot 3 SampleID (only used in SIMD16)
> >> > +       *     11:8 Slot 2 SampleID (only used in SIMD16)
> >> > +       *      7:4 Slot 1 SampleID
> >> > +       *      3:0 Slot 0 SampleID
> >> > +       *
> >> > +       * Each slot corresponds to four channels, so we want to 
replicate
> > each
> >> > +       * half-byte value to 4 channels in a row:
> >> > +       *
> >> > +       *    dst+0:    .7    .6    .5    .4    .3    .2    .1    .0
> >> > +       *             7:4   7:4   7:4   7:4   3:0   3:0   3:0   3:0
> >> > +       *
> >> > +       *    dst+1:    .7    .6    .5    .4    .3    .2    .1    .0  
(if
> > SIMD16)
> >> > +       *           15:12 15:12 15:12 15:12  11:8  11:8  11:8  11:8
> >> > +       *
> >> > +       * First, we read g1.0 with a <1,8,0>UB region, causing the 
first 8
> >> > +       * channels to read the first byte (7:0), and the second group 
of 8
> >> > +       * channels to read the second byte (15:8).  Then, we shift 
right
> > by
> >> > +       * a vector immediate of <4, 4, 4, 4, 0, 0, 0, 0>, moving the 
slot
> > 1 / 3
> >> > +       * values into place.  Finally, we AND with 0xf to keep the low
> > nibble.
> >> > +       *
> >> > +       *    shr(16) tmp<1>UW g1.0<1,8,0>UB 0x44440000:V
> >> > +       *    and(16) dst<1>UD tmp<8,8,1>UD  0xf:UD
> >>
> >> The comment doesn't match the code (and as the comment is, it wouldn't
> > work).
> >>
> >> - tmp is typed W below, but is UW and UD in the comment
> >> - g1.0 is B below, but UB in the comment
> >> - 0xf is W below, but UD in the comment
> >
> > You're right - I changed the code, and neglected to update the comment.
> >
> > I can update the comment to match the code.
> >
> >> > +       *
> >> > +       * TODO: These payload bits exist on Gen7 too, but they appear 
to
> > always
> >> > +       *       be zero, so this code fails to work.  We should find 
out
> > why.
> >> > +       */
> >> > +      fs_reg tmp(VGRF, alloc.allocate(1), BRW_REGISTER_TYPE_W);
> >> > +
> >> > +      abld.SHR(tmp, fs_reg(stride(retype(brw_vec1_grf(1, 0),
> >> > +                                         BRW_REGISTER_TYPE_B), 1, 8, 
0)),
> >> > +                    brw_imm_v(0x44440000));
> >> > +      abld.AND(*reg, tmp, brw_imm_w(0xf));
> >>
> >> ANDing with 0xf is only necessary if you're operating on signed types,
> >> which you don't need to be doing. Change the types to unsigned, and
> >> replace the AND with a MOV that expands the UW source to UD.
> >
> > I don't understand.  I don't think signedness should matter.  I'm using
> > SHR, not ASR - so there's no sign extension.  After that, "tmp" should
> > contain a bunch of 4-bit values.  Reading it as W, the top bit should
> > not ever be set.
> >
> > The byte being read contains:
> >  - 7:4 Slot 1 SampleID
> >  - 3:0 Slot 0 SampleID
> >
> > Shifting right by 0x44440000 means that the first four channels contain
> >
> >  - 7:4 Zero
> >  - 3:0 Slot 1 SampleID
> >
> > while the second group of four channels remain untouched:
> >
> >  - 7:4 Slot 1 SampleID
> >  - 3:0 Slot 0 SampleID
> >
> > So for that group, I need to AND with 0xf (4 bits) to mask out the
> > "Slot 1 SampleID" in bits 7:4, so I just get "Slot 0" as desired.
> 
> Oh, right. The shift-by-zero case requires ANDing with 0xf. Makes sense.

So...what would you like me to change?  Just update the comment to match
the code?  Or change the code somehow as well?

Thanks for looking at this!

--Ken
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160419/5335ff88/attachment-0001.sig>


More information about the mesa-dev mailing list