[Nouveau] Fermi+ shader header docs
Ilia Mirkin
imirkin at alum.mit.edu
Fri Aug 14 12:48:59 PDT 2015
And as I've just started looking at GM107 traces to fix up
tessellation shader attribute address calculations, I noticed the
following unknown bits in CommonWord3 of TCP shaders:
PB: 0x00000021 GM107_3D.SP[0x2].SELECT = { ENABLE | PROGRAM = TCP }
PB: 0x00000830 GM107_3D.SP[0x2].START_ID = 0x830
HEADER:
0x04210861 0 = { SPH = VTG | VERSION = 3 | KIND = TCP | GMEM_STORE | SASS_VERS
0x06000000 1 = { LMEM_POS_ALLOC = 0 | PATCH_ATTRIBUTES = 6 }
0x03000000 2 = { LMEM_NEG_ALLOC = 0 | THREADS_PER_PRIM = 3 }
0x60000000 3 = { WARP_CSTACK_SIZE = 0 | 0x60000000 }
0xff000000 4 = { MIN_OUT_READ_SLOT = 0 | MAX_OUT_READ_SLOT = 0xff }
0xf0000000 ATTR_EN_0 = 0xf0000000
0x00000000 ATTR_EN_1 = 0
0x00000000 ATTR_EN_2 = 0
0x00000000 ATTR_EN_3 = 0
0x00000000 ATTR_EN_4 = 0
0x00000000 ATTR_EN_5 = { 0 }
0x00000000 11 = 0
0x00000000 12 = 0
0x0000f000 EXPORT_EN_0 = { HPOS = 0xf }
0x00000000 EXPORT_EN_1 = 0
0x00000000 EXPORT_EN_2 = 0
0x00000000 EXPORT_EN_3 = 0
0x00000000 EXPORT_EN_4 = 0
0x00000000 EXPORT_EN_5 = { CLIP_DISTANCE = 0 | UNK12 = 0 }
0x00000000 19 = 0
Anything that we need to also be setting?
-ilia
On Mon, Jun 22, 2015 at 9:10 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> And an additional question: I have a trace here where a reserved bit
> from CommonWord0 is set. Is that just random values that aren't
> cleared by the driver, or does it have some significance? Here is the
> full shader:
>
> HEADER:
> 0x06040461 0 = { SPH = VTG | VERSION = 3 | KIND = VP_B |
> SASS_VERSION = 2 | LDST_ENABLE | SO_MASK = 0 | 0x2000000 }
> 0x00000000 1 = { LMEM_POS_ALLOC = 0 | PATCH_ATTRIBUTES = 0 }
> 0x00000000 2 = { LMEM_NEG_ALLOC = 0 | THREADS_PER_PRIM = 0 }
> 0x00000000 3 = { WARP_CSTACK_SIZE = 0 | OUTPUT_PRIM = 0 }
> 0x00000000 4 = { MAX_OUTPUT_VERTS = 0 | MIN_OUT_READ_SLOT = 0 |
> MAX_OUT_READ_SLOT = 0 }
> 0x00000000 ATTR_EN_0 = 0
> 0x00000000 ATTR_EN_1 = 0
> 0x00000000 ATTR_EN_2 = 0
> 0x00000000 ATTR_EN_3 = 0
> 0x00000000 ATTR_EN_4 = 0
> 0x00000000 ATTR_EN_5 = { 0 }
> 0x00000000 11 = 0
> 0x00000000 12 = 0
> 0x0001f000 EXPORT_EN_0 = { HPOS = 0xf | 0x10000 }
> 0x00000000 EXPORT_EN_1 = 0
> 0x00000000 EXPORT_EN_2 = 0
> 0x00000000 EXPORT_EN_3 = 0
> 0x00000000 EXPORT_EN_4 = 0
> 0x00000000 EXPORT_EN_5 = { CLIP_DISTANCE = 0 | UNK12 = 0 }
> 0x00000000 19 = 0
> CODE:
> 00000000: a01088b0 08bcb810 sched 0x2c 0x22 0x4 0x28 0x4 0x2e 0x2f
> 00000008: 0b1ffc1e 5b601c07 set $p0 0x1 ge u32 0x0 c0[0x3858]
> 00000010: 1000003c 12000000 $p0 bra 0x38
> 00000018: 0a1c0002 64c03c07 mov b32 $r0 c0[0x3850]
> 00000020: 0a9c0006 64c03c07 mov b32 $r1 c0[0x3854]
> 00000028: 001c0000 cc800000 ld b32 $r0 cg g[$r0d]
> 00000030: 041c003c 12000000 bra 0x40
>
> 00000038: 7f9c0002 e4c03c00 C mov b32 $r0 0x0
>
> 00000040: 9c108010 090c8c10 C sched 0x4 0x20 0x4 0x27 0x4 0x23 0x43
> 00000048: 001c2802 e5c00000 cvt rn f32 $r0 u32 $r0
> 00000050: 341c0006 64c03c00 mov b32 $r1 c0[0x1a0]
> 00000058: 349c000a 64c03c00 mov b32 $r2 c0[0x1a4]
> 00000060: 351c000e 64c03c00 mov b32 $r3 c0[0x1a8]
> 00000068: 359c0012 64c03c00 mov b32 $r4 c0[0x1ac]
> 00000070: 381ffc06 7f03fc00 st b32 a[0x70] $r1 0x0 0x0
> 00000078: 3a1ffc0a 7f03fc00 st b32 a[0x74] $r2 0x0 0x0
> 00000080: 3c110d0c 08000001 sched 0x43 0x43 0x4 0x4f 0x0 0x0 0x0
> 00000088: 3c1ffc0e 7f03fc00 st b32 a[0x78] $r3 0x0 0x0
> 00000090: 3e1ffc12 7f03fc00 st b32 a[0x7c] $r4 0x0 0x0
> 00000098: 401ffc02 7f03fc00 st b32 a[0x80] $r0 0x0 0x0
> 000000a0: 001c003c 18000000 exit
>
> 000000a8: fc1c003c 12007fff C bra 0xa8
> 000000b0: 001c3c02 85800000 nop
> 000000b8: 001c3c02 85800000 nop
>
> On Sat, May 23, 2015 at 5:35 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> On Thu, May 21, 2015 at 11:32 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>> On Thu, May 21, 2015 at 10:05 AM, Robert Morell <rmorell at nvidia.com> wrote:
>>>> Hi Ilia,
>>>>
>>>> On Sat, May 02, 2015 at 12:34:21PM -0400, Ilia Mirkin wrote:
>>>>> Hi,
>>>>>
>>>>> As I'm looking to add some support to nouveau for features like atomic
>>>>> counters and images, I'm running into some confusion about what the
>>>>> first word of the shader header means. Here is the definition as we
>>>>> have it today:
>>>>
>>>> [...]
>>>>
>>>>> However I know that these are somewhat wrong. I've seen shaders that
>>>>> use gmem accesses (i.e. mov r0, [r0]) that just have the LMEM enable
>>>>> bit set (and they use no lmem). And I've seen additional bits set, esp
>>>>> relating to images, but I haven't spent enough time looking at all the
>>>>> variations to make sense of it yet. For example, I think that Fermi
>>>>> and Kepler+ have different meanings for some of the bits.
>>>>
>>>> Those look pretty close :)
>>>>
>>>>> I was hoping you could just release the docs for the shader headers,
>>>>> or at least the first word of the shader header.
>>>>
>>>> We've posted the specification for the full Shader Program Header to our
>>>> GPU documentation site here:
>>>>
>>>> ftp://download.nvidia.com/open-gpu-doc/Shader-Program-Header/1/Shader-Program-Header.html
>>>>
>>>> I hope it helps clear things up.
>>>
>>> Yep, just a few follow-up questions:
>>>
>>> - SPH Type 1 and type 2 appear to be flipped wrt the tables -- "When
>>> PS is used, field SphType in CommonWord0 must be set to 1; similarly,
>>> when VTG is used, SphType in CommonWord0 must be set to 2." But the
>>> "Table 1. SPH Type 1 Definition" is clearly meant for VTG and table 2
>>> is clearly meant for PS...
>>> - You skip over SassVersion -- what is that?
>>> - You have a funny note in there -- "Triangles generated by the
>>> geometry shader always have all their edge flags set to TRUE" -- that
>>> is the *only* reference to edge flags in the whole document. Right now
>>> we do some crazy thing to get edge flags right on fermi+ (and I think
>>> we just get them wrong on tesla). Is there a way to emit edge flags
>>> from vertex shader?
>>> - To be clear: DoesLoadOrStore -- *any* load/store? Even LDC? ALD?
>>
>> Oh, and one more little correction:
>>
>> """
>> The SPH field OutputTopology sets the primitive topology of the
>> vertices that are output from the pipe stage. This field is only used
>> with geometry shaders, where the value must be greater than zero and
>> has a maximum of 1024. The allowed values are: ... [the correct values
>> for OutputTopology]
>> """
>>
>> The 1024 thing seems like it probably applies to MaxOutputVertexCount
>> in CommonWord4.
>>
>> -ilia
More information about the Nouveau
mailing list