[Mesa-dev] [PATCH 5/8] i965/gs: Fix up gl_PointSize input swizzling for DUAL_INSTANCED gs.

Paul Berry stereotype441 at gmail.com
Sat Oct 19 17:13:52 CEST 2013


On 18 October 2013 17:04, Eric Anholt <eric at anholt.net> wrote:

> Paul Berry <stereotype441 at gmail.com> writes:
>
> > Geometry shaders that run in "DUAL_INSTANCED" mode store their inputs
> > in vec4's.  This means that when compiling gl_PointSize input
> > swizzling (a MOV instruction which uses a geometry shader input as
> > both source and destination), we need to do two things:
> >
> > - Set force_writemask_all to ensure that the MOV happens regardless of
> >   which channels are enabled.
> >
> > - Set the source register region to <4;4,1> (instead of <0;4,1> to
> >   satisfy register region restrictions.
>
> This sure sounds like something you empirically found, but I'm confused.
> I would have assumed that DUAL_INSTANCED with an instance count of 1
> meant that the channel enables the hardware gave you had the first 4
> enabled and the second 4 disabled.  And since the dst.width (and thus
> execsize) is 4, whether or not the second 4 are disabled wouldn't
> matter.  In that case, why do you need the writemask forced, since just
> the 4 channels you care about will be affected, anyway?
>

Setting force_writemask_all was not empirical--I did it out of general
caution. I couldn't find any documentation in the bspec to guarantee that
an instance count of 1 would get dispatched to the first 4 channels rather
than the second 4, and I didn't want to rely on undocumented behaviour.


>
> And, if dst.width == 4, then execsize == 4, and I'm confused what
> register region restriction is being honored by promoting the hstride to
> 4.
>

This part was empirical.  I discovered that if I don't set the hstride to
4, then I get an assertion failure here (in validate_reg() in
brw_eu_emit.c):

   /* 4. */
   if (execsize == width && hstride != 0) {
      assert(vstride == -1 || vstride == width * hstride);
   }

I don't know why the hardware cares about this, but I double-checked the
bspec and this restriction is really there.

Side note: I forgot to mention it in the comments, but in addition to
fixing up <0;4,1> -> <4;4,1>, this code fixes up <8;8,1> -> <4;4,1>.
That's necessary for a stupid reason: in the vec4 back-end we represent
GRFs as <8;8,1> so that guess_execution_size() will correctly guess an
execution size of 8.  However, in align16 mode, the hardware assumes a
width of 4, so it really needs to be <4;4,1>.  Normally, we fix that up in
brw_set_src0() and brw_set_src1(), but we do it *after* the call to
validate_reg().  So to avoid validate_reg() asserting on these width-4
instructions, we need to change the source register from <8;8,1> to <4;4,1>
before emitting the instruction.  One of these days, I swear I'm going to
get rid of guess_execution_size() so we can end this sort of madness.


>
> Putting these fixups for a couple of weird cases in just MOV and ADD
> feels wrong to me, but maybe when I understand better what's going on
> it'll seem more natural.
>

Another possibility I'd be equally happy with would be to put the fixup at
the top of vec4_generator::generate_vec4_instruction(), before the switch
statement.  It would look something like this:

   if (dst.width == BRW_WIDTH_4) {
      /* This happens in attribute fixups for "dual instanced" geometry
       * shaders, since they use attributes that are vec4's.  Since the exec
       * width is only 4, it's essential that the caller set
       * force_writemask_all in order to make sure the instruction is
executed
       * regardless of which channels are enabled.
       */
      assert(inst->force_writemask_all);

      /* Fix up any <8;8,1> or <0;4,1> source registers to <4;4,1> to
satisfy
       * the following register region restrictions (from Graphics BSpec:
       * 3D-Media-GPGPU Engine > EU Overview > Registers and Register
Regions
       * > Register Region Restrictions)
       *
       *     1. ExecSize must be greater than or equal to Width.
       *
       *     2. If ExecSize = Width and HorzStride != 0, VertStride must be
set
       *        to Width * HorzStride."
       */
      for (int i = 0; i < 3; i++) {
         if (src[i].file == BRW_GENERAL_REGISTER_FILE)
            src[i] = stride(src[i], 4, 4, 1);
      }
   }

Does that seem better to you?  I actually think I like it slightly better
because by making the assertion more general, I caught another case where I
think I should be setting force_writemask_all to be on the safe side (the
"clear r0.2" instruction in the gs prolog).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131019/2dd31f13/attachment.html>


More information about the mesa-dev mailing list