[Mesa-dev] [PATCH 5/8] i965/gs: Fix up gl_PointSize input swizzling for DUAL_INSTANCED gs.

Eric Anholt eric at anholt.net
Tue Oct 22 03:34:14 CEST 2013


Paul Berry <stereotype441 at gmail.com> writes:

> On 18 October 2013 17:04, Eric Anholt <eric at anholt.net> wrote:
>> Putting these fixups for a couple of weird cases in just MOV and ADD
>> feels wrong to me, but maybe when I understand better what's going on
>> it'll seem more natural.
>>
>
> Another possibility I'd be equally happy with would be to put the fixup at
> the top of vec4_generator::generate_vec4_instruction(), before the switch
> statement.  It would look something like this:
>
>    if (dst.width == BRW_WIDTH_4) {
>       /* This happens in attribute fixups for "dual instanced" geometry
>        * shaders, since they use attributes that are vec4's.  Since the exec
>        * width is only 4, it's essential that the caller set
>        * force_writemask_all in order to make sure the instruction is
> executed
>        * regardless of which channels are enabled.
>        */
>       assert(inst->force_writemask_all);
>
>       /* Fix up any <8;8,1> or <0;4,1> source registers to <4;4,1> to
> satisfy
>        * the following register region restrictions (from Graphics BSpec:
>        * 3D-Media-GPGPU Engine > EU Overview > Registers and Register
> Regions
>        * > Register Region Restrictions)
>        *
>        *     1. ExecSize must be greater than or equal to Width.
>        *
>        *     2. If ExecSize = Width and HorzStride != 0, VertStride must be
> set
>        *        to Width * HorzStride."
>        */
>       for (int i = 0; i < 3; i++) {
>          if (src[i].file == BRW_GENERAL_REGISTER_FILE)
>             src[i] = stride(src[i], 4, 4, 1);
>       }
>    }
>
> Does that seem better to you?  I actually think I like it slightly better
> because by making the assertion more general, I caught another case where I
> think I should be setting force_writemask_all to be on the safe side (the
> "clear r0.2" instruction in the gs prolog).

I like this better -- it makes more sense to me for the fixup to be
non-opcode-specific.  Any I think I get the problem now -- our registers
would make a ton of sense as <4;4,1> in general (that's how I think of
our GRFs in align16, at least!), except that we can't because then we'd
guess an execsize of 4.  I'm in favor of the kill-guess-execsize plan,
even if we leave it in place for gen4/5 SF/CLIP threads (which bounce
execsize all over the place iirc and would be a pain to convert) and
only convert the new backends.

Another option: How about instead of that assert in brw_eu_emit.c, we
just smash the vstride to be width * hstride?  We know the vstride
doesn't matter, because you're only using execsize components, so let's
just not bother our brw_eu.c callers with this little problem.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131021/815aa34a/attachment.pgp>


More information about the mesa-dev mailing list