[Nouveau] Proper gl_SampleMask output

Ilia Mirkin imirkin at alum.mit.edu
Wed Apr 30 09:10:02 PDT 2014


Great, thanks! I've looked at a bunch more shaders (unrelated to this)
in the meanwhile, and I'm pretty sure the extra FMA business is
totally unrelated to the issue at hand. It appears that the NVIDIA
driver generates shaders that are capable of transforming the output
coordinates arbitrarily, whereas mesa-generated shaders (and by
extension, nouveau's) only put in the specific needs -- e.g. if it
only needs to be able to invert Y, then only the Y coordinate will be
affected by the FMA.

On Wed, Apr 30, 2014 at 12:02 PM, Andy Ritger <aritger at nvidia.com> wrote:
> Hi Ilia.  I'll take a look and see what I can find out.
>
> Thanks,
> - Andy
>
>
> On Wed, Apr 23, 2014 at 05:03:17PM -0700, Ilia Mirkin wrote:
>> On Wed, Apr 23, 2014 at 6:22 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> > Hello,
>> >
>> > I've been trying to add ARB_sample_shading support to nouveau, and am
>> > being defeated by the gl_SampleMask tests. Everything else works fine.
>> > (And naturally the tests pass with the proprietary driver.) I'm trying
>> > to do this for both GT21x, as well as GF100+.
>> >
>> > In the GT21x case, it seems like the low bit of method 0x1928 needs to
>> > be set (as well as the second-to-lowest bit), for GF100+, the low bit
>> > of the last dword of the shader header needs to be set.
>> >
>> > But exactly which register is the output supposed to go into? It looks
>> > like with the proprietary driver, r0..r3 get the first color output,
>> > and r4 gets the sample mask. However the way that things are set up
>> > with nouveau, r4..r7 get the first color output (and that part works
>> > fine). But where should the sample mask go at the end of the fragment
>> > program? r0? r8? (I've tried all of those with minimal effect.)
>> > Perhaps there's more configuration that I'm missing regarding the
>> > sample mask? Also, how does this interact with the frag depth (which
>> > also gets implicitly assigned based on color outputs)?
>>
>> As a clarification to the r0..r3 vs r4..r7 for first color output,
>> I've changed things around to ensure that the first color output ends
>> up in r0..r3 in the nouveau shader too. The shader generated by
>> nouveau is:
>>
>> HDR[00] = 0x00021462
>> HDR[04] = 0x00000000
>> HDR[08] = 0x00000000
>> HDR[0c] = 0x00000000
>> HDR[10] = 0x00000000
>> HDR[14] = 0xf0000000
>> HDR[18] = 0x00000000
>> HDR[1c] = 0x00000000
>> HDR[20] = 0x00000000
>> HDR[24] = 0x00000000
>> HDR[28] = 0x00000000
>> HDR[2c] = 0x00000000
>> HDR[30] = 0x00000000
>> HDR[34] = 0x00000000
>> HDR[38] = 0x00000000
>> HDR[3c] = 0x00000000
>> HDR[40] = 0x00000000
>> HDR[44] = 0x00000000
>> HDR[48] = 0x0000000f
>> HDR[4c] = 0x00000001
>> shader binary code (0x80 bytes):
>> 42e04237 22804280 fff01c00 c07e0070 fff05c00 c07e0074 10009de4 28004000
>> 00105c00 30044000 01201c84 14060000 04001c02 10408102 05205c84 14060000
>> 720042e7 22e20042 04105c02 10040404 04011c83 68000000 0000dde2 18fe0000
>> 00001de2 18000000 0c005de4 28000000 00009de2 18000000 00001de7 80000000
>>
>> which, with "nvdisas -b SM30 -raw" decodes to
>>
>>         /*0008*/                IPA.PASS R0, a[0x70], RZ;
>>         /*0010*/                IPA.PASS R1, a[0x74], RZ;
>>         /*0018*/                MOV R2, c[0x0][0x4];
>>         /*0020*/                FFMA R1, R1, c[0x0][0x0], R2;
>>         /*0028*/                F2I.S32.F32.TRUNC R0, R0;
>>         /*0030*/                IMUL32I.U32.U32 R0, R0, 0x10204081;
>>         /*0038*/                F2I.S32.F32.TRUNC R1, R1;
>>         /*0048*/                IMUL32I.U32.U32 R1, R1, 0x1010101;
>>         /*0050*/                LOP.XOR R4, R0, R1;
>>         /*0058*/                MOV32I R3, 0x3f800000;
>>         /*0060*/                MOV32I R0, 0x0;
>>         /*0068*/                MOV R1, R3;
>>         /*0070*/                MOV32I R2, 0x0;
>>         /*0078*/                EXIT ;
>>
>> While the proprietary-driver-generated shader is: [the output is of
>> quad-word-writes, so the right-most dword is the first of 4... so you
>> have to read it right-to-left]
>>
>> --816-- w 27:0x0430, 0x00000000,0x00000000,0x00000000,0x00001462
>> --816-- w 27:0x0440, 0x00000000,0x00000000,0xb0000000,0x00000000
>> --816-- w 27:0x0450, 0x00000000,0x00000000,0x00000000,0x00000000
>> --816-- w 27:0x0460, 0x00000000,0x00000000,0x00000000,0x00000000
>> --816-- w 27:0x0470, 0x00000001,0x0000000f,0x00000000,0x00000000
>> --816-- w 27:0x0480, 0xc07e0074,0xfff05c00,0x22324232,0xa0423047
>> --816-- w 27:0x0490, 0xc07e0070,0xfff01c00,0x2800403c,0x10009de4
>> --816-- w 27:0x04a0, 0x3004803c,0x30105c40,0x2800403c,0x0000dde4
>> --816-- w 27:0x04b0, 0x14860000,0x05201c84,0x3006803c,0x20009c40
>> --816-- w 27:0x04c0, 0x14860000,0x09205c84,0x22004280,0x42304247
>> --816-- w 27:0x04d0, 0x28000000,0xfc001de4,0x10040404,0x04009ca2
>> --816-- w 27:0x04e0, 0x18fe0000,0x00005de2,0x10408102,0x0410dca2
>> --816-- w 27:0x04f0, 0x28000000,0xfc009de4,0x68000000,0x08311c83
>> --816-- w 27:0x0500, 0x28000000,0x0400dde4,0x20000000,0x0002e047
>> --816-- w 27:0x0510, 0x4003ffff,0xe0001de7,0x80000000,0x00001de7
>> --816-- w 27:0x0520, 0x40000000,0x00001de4,0x40000000,0x00001de4
>> --816-- w 27:0x0530, 0x40000000,0x00001de4,0x40000000,0x00001de4
>>
>> Which decodes to:
>>
>>         /*0008*/                IPA.PASS R1, a[0x74], RZ;
>>         /*0010*/                MOV R2, c[0x0][0xf04];
>>         /*0018*/                IPA.PASS R0, a[0x70], RZ;
>>         /*0020*/                MOV R3, c[0x0][0xf00];
>>         /*0028*/                FFMA.FTZ R1, R1, R2, c[0x0][0xf0c];
>>         /*0030*/                FFMA.FTZ R2, R0, R3, c[0x0][0xf08];
>>         /*0038*/                F2I.FTZ.S32.F32.TRUNC R0, R1;
>>         /*0048*/                F2I.FTZ.S32.F32.TRUNC R1, R2;
>>         /*0050*/                IMUL32I R2, R0, 0x1010101;
>>         /*0058*/                MOV R0, RZ;
>>         /*0060*/                IMUL32I R3, R1, 0x10204081;
>>         /*0068*/                MOV32I R1, 0x3f800000;
>>         /*0070*/                LOP.XOR R4, R3, R2;
>>         /*0078*/                MOV R2, RZ;
>>         /*0088*/                MOV R3, R1;
>>         /*0090*/                EXIT ;
>>
>> (Not sure why the nouveau shader only has 1 FMA, but that's the input
>> shader we get from Gallium. I highly doubt this is the source of the
>> error, since it has nothing to do with sample masks, but my question
>> about sample mask output still stands even if its :) )
>>
>> Oh, and for completeness, the input GLSL shader is:
>>
>> "#version 130\n"
>> "#extension GL_ARB_sample_shading : enable\n"
>> "out vec4 out_color;\n"
>> "void main()\n"
>> "{\n"
>>   /* For 128x128 image size, below formula produces a bit
>>    * pattern where no two bits of gl_SampleMask[0] are
>>    * correlated.
>>    */
>> "  gl_SampleMask[0] = (int(gl_FragCoord.x) * 0x10204081) ^\n"
>> "                     (int(gl_FragCoord.y) * 0x01010101);\n"
>> "  out_color = vec4(0.0, 1.0, 0.0, 1.0);\n"
>> "}\n";
>>
>> >
>> > Any insight into this would be hugely helpful. In case you feel like
>> > taking a look at the actual code, these are my commits:
>> > https://github.com/imirkin/mesa/commits/sample_shading . Note that
>> > some bits of the sample mask were already there for nvc0 (like setting
>> > the shader header bit), thus don't appear in my change.
>> >
>> > Thanks,
>> >
>> >   -ilia
>> _______________________________________________
>> Nouveau mailing list
>> Nouveau at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/nouveau


More information about the Nouveau mailing list