[Nouveau] Proper gl_SampleMask output

Ilia Mirkin imirkin at alum.mit.edu
Wed Apr 23 17:03:17 PDT 2014


On Wed, Apr 23, 2014 at 6:22 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> Hello,
>
> I've been trying to add ARB_sample_shading support to nouveau, and am
> being defeated by the gl_SampleMask tests. Everything else works fine.
> (And naturally the tests pass with the proprietary driver.) I'm trying
> to do this for both GT21x, as well as GF100+.
>
> In the GT21x case, it seems like the low bit of method 0x1928 needs to
> be set (as well as the second-to-lowest bit), for GF100+, the low bit
> of the last dword of the shader header needs to be set.
>
> But exactly which register is the output supposed to go into? It looks
> like with the proprietary driver, r0..r3 get the first color output,
> and r4 gets the sample mask. However the way that things are set up
> with nouveau, r4..r7 get the first color output (and that part works
> fine). But where should the sample mask go at the end of the fragment
> program? r0? r8? (I've tried all of those with minimal effect.)
> Perhaps there's more configuration that I'm missing regarding the
> sample mask? Also, how does this interact with the frag depth (which
> also gets implicitly assigned based on color outputs)?

As a clarification to the r0..r3 vs r4..r7 for first color output,
I've changed things around to ensure that the first color output ends
up in r0..r3 in the nouveau shader too. The shader generated by
nouveau is:

HDR[00] = 0x00021462
HDR[04] = 0x00000000
HDR[08] = 0x00000000
HDR[0c] = 0x00000000
HDR[10] = 0x00000000
HDR[14] = 0xf0000000
HDR[18] = 0x00000000
HDR[1c] = 0x00000000
HDR[20] = 0x00000000
HDR[24] = 0x00000000
HDR[28] = 0x00000000
HDR[2c] = 0x00000000
HDR[30] = 0x00000000
HDR[34] = 0x00000000
HDR[38] = 0x00000000
HDR[3c] = 0x00000000
HDR[40] = 0x00000000
HDR[44] = 0x00000000
HDR[48] = 0x0000000f
HDR[4c] = 0x00000001
shader binary code (0x80 bytes):
42e04237 22804280 fff01c00 c07e0070 fff05c00 c07e0074 10009de4 28004000
00105c00 30044000 01201c84 14060000 04001c02 10408102 05205c84 14060000
720042e7 22e20042 04105c02 10040404 04011c83 68000000 0000dde2 18fe0000
00001de2 18000000 0c005de4 28000000 00009de2 18000000 00001de7 80000000

which, with "nvdisas -b SM30 -raw" decodes to

        /*0008*/                IPA.PASS R0, a[0x70], RZ;
        /*0010*/                IPA.PASS R1, a[0x74], RZ;
        /*0018*/                MOV R2, c[0x0][0x4];
        /*0020*/                FFMA R1, R1, c[0x0][0x0], R2;
        /*0028*/                F2I.S32.F32.TRUNC R0, R0;
        /*0030*/                IMUL32I.U32.U32 R0, R0, 0x10204081;
        /*0038*/                F2I.S32.F32.TRUNC R1, R1;
        /*0048*/                IMUL32I.U32.U32 R1, R1, 0x1010101;
        /*0050*/                LOP.XOR R4, R0, R1;
        /*0058*/                MOV32I R3, 0x3f800000;
        /*0060*/                MOV32I R0, 0x0;
        /*0068*/                MOV R1, R3;
        /*0070*/                MOV32I R2, 0x0;
        /*0078*/                EXIT ;

While the proprietary-driver-generated shader is: [the output is of
quad-word-writes, so the right-most dword is the first of 4... so you
have to read it right-to-left]

--816-- w 27:0x0430, 0x00000000,0x00000000,0x00000000,0x00001462
--816-- w 27:0x0440, 0x00000000,0x00000000,0xb0000000,0x00000000
--816-- w 27:0x0450, 0x00000000,0x00000000,0x00000000,0x00000000
--816-- w 27:0x0460, 0x00000000,0x00000000,0x00000000,0x00000000
--816-- w 27:0x0470, 0x00000001,0x0000000f,0x00000000,0x00000000
--816-- w 27:0x0480, 0xc07e0074,0xfff05c00,0x22324232,0xa0423047
--816-- w 27:0x0490, 0xc07e0070,0xfff01c00,0x2800403c,0x10009de4
--816-- w 27:0x04a0, 0x3004803c,0x30105c40,0x2800403c,0x0000dde4
--816-- w 27:0x04b0, 0x14860000,0x05201c84,0x3006803c,0x20009c40
--816-- w 27:0x04c0, 0x14860000,0x09205c84,0x22004280,0x42304247
--816-- w 27:0x04d0, 0x28000000,0xfc001de4,0x10040404,0x04009ca2
--816-- w 27:0x04e0, 0x18fe0000,0x00005de2,0x10408102,0x0410dca2
--816-- w 27:0x04f0, 0x28000000,0xfc009de4,0x68000000,0x08311c83
--816-- w 27:0x0500, 0x28000000,0x0400dde4,0x20000000,0x0002e047
--816-- w 27:0x0510, 0x4003ffff,0xe0001de7,0x80000000,0x00001de7
--816-- w 27:0x0520, 0x40000000,0x00001de4,0x40000000,0x00001de4
--816-- w 27:0x0530, 0x40000000,0x00001de4,0x40000000,0x00001de4

Which decodes to:

        /*0008*/                IPA.PASS R1, a[0x74], RZ;
        /*0010*/                MOV R2, c[0x0][0xf04];
        /*0018*/                IPA.PASS R0, a[0x70], RZ;
        /*0020*/                MOV R3, c[0x0][0xf00];
        /*0028*/                FFMA.FTZ R1, R1, R2, c[0x0][0xf0c];
        /*0030*/                FFMA.FTZ R2, R0, R3, c[0x0][0xf08];
        /*0038*/                F2I.FTZ.S32.F32.TRUNC R0, R1;
        /*0048*/                F2I.FTZ.S32.F32.TRUNC R1, R2;
        /*0050*/                IMUL32I R2, R0, 0x1010101;
        /*0058*/                MOV R0, RZ;
        /*0060*/                IMUL32I R3, R1, 0x10204081;
        /*0068*/                MOV32I R1, 0x3f800000;
        /*0070*/                LOP.XOR R4, R3, R2;
        /*0078*/                MOV R2, RZ;
        /*0088*/                MOV R3, R1;
        /*0090*/                EXIT ;

(Not sure why the nouveau shader only has 1 FMA, but that's the input
shader we get from Gallium. I highly doubt this is the source of the
error, since it has nothing to do with sample masks, but my question
about sample mask output still stands even if its :) )

Oh, and for completeness, the input GLSL shader is:

"#version 130\n"
"#extension GL_ARB_sample_shading : enable\n"
"out vec4 out_color;\n"
"void main()\n"
"{\n"
  /* For 128x128 image size, below formula produces a bit
   * pattern where no two bits of gl_SampleMask[0] are
   * correlated.
   */
"  gl_SampleMask[0] = (int(gl_FragCoord.x) * 0x10204081) ^\n"
"                     (int(gl_FragCoord.y) * 0x01010101);\n"
"  out_color = vec4(0.0, 1.0, 0.0, 1.0);\n"
"}\n";

>
> Any insight into this would be hugely helpful. In case you feel like
> taking a look at the actual code, these are my commits:
> https://github.com/imirkin/mesa/commits/sample_shading . Note that
> some bits of the sample mask were already there for nvc0 (like setting
> the shader header bit), thus don't appear in my change.
>
> Thanks,
>
>   -ilia


More information about the Nouveau mailing list