[Nouveau] Fermi+ shader header docs

Ilia Mirkin imirkin at alum.mit.edu
Mon Jun 22 18:10:00 PDT 2015


And an additional question: I have a trace here where a reserved bit
from CommonWord0 is set. Is that just random values that aren't
cleared by the driver, or does it have some significance? Here is the
full shader:

HEADER:
0x06040461   0 = { SPH = VTG | VERSION = 3 | KIND = VP_B |
SASS_VERSION = 2 | LDST_ENABLE | SO_MASK = 0 | 0x2000000 }
0x00000000   1 = { LMEM_POS_ALLOC = 0 | PATCH_ATTRIBUTES = 0 }
0x00000000   2 = { LMEM_NEG_ALLOC = 0 | THREADS_PER_PRIM = 0 }
0x00000000   3 = { WARP_CSTACK_SIZE = 0 | OUTPUT_PRIM = 0 }
0x00000000   4 = { MAX_OUTPUT_VERTS = 0 | MIN_OUT_READ_SLOT = 0 |
MAX_OUT_READ_SLOT = 0 }
0x00000000   ATTR_EN_0 = 0
0x00000000   ATTR_EN_1 = 0
0x00000000   ATTR_EN_2 = 0
0x00000000   ATTR_EN_3 = 0
0x00000000   ATTR_EN_4 = 0
0x00000000   ATTR_EN_5 = { 0 }
0x00000000   11 = 0
0x00000000   12 = 0
0x0001f000   EXPORT_EN_0 = { HPOS = 0xf | 0x10000 }
0x00000000   EXPORT_EN_1 = 0
0x00000000   EXPORT_EN_2 = 0
0x00000000   EXPORT_EN_3 = 0
0x00000000   EXPORT_EN_4 = 0
0x00000000   EXPORT_EN_5 = { CLIP_DISTANCE = 0 | UNK12 = 0 }
0x00000000   19 = 0
CODE:
00000000: a01088b0 08bcb810     sched 0x2c 0x22 0x4 0x28 0x4 0x2e 0x2f
00000008: 0b1ffc1e 5b601c07     set $p0 0x1 ge u32 0x0 c0[0x3858]
00000010: 1000003c 12000000     $p0 bra 0x38
00000018: 0a1c0002 64c03c07     mov b32 $r0 c0[0x3850]
00000020: 0a9c0006 64c03c07     mov b32 $r1 c0[0x3854]
00000028: 001c0000 cc800000     ld b32 $r0 cg g[$r0d]
00000030: 041c003c 12000000     bra 0x40

00000038: 7f9c0002 e4c03c00  C  mov b32 $r0 0x0

00000040: 9c108010 090c8c10  C  sched 0x4 0x20 0x4 0x27 0x4 0x23 0x43
00000048: 001c2802 e5c00000     cvt rn f32 $r0 u32 $r0
00000050: 341c0006 64c03c00     mov b32 $r1 c0[0x1a0]
00000058: 349c000a 64c03c00     mov b32 $r2 c0[0x1a4]
00000060: 351c000e 64c03c00     mov b32 $r3 c0[0x1a8]
00000068: 359c0012 64c03c00     mov b32 $r4 c0[0x1ac]
00000070: 381ffc06 7f03fc00     st b32 a[0x70] $r1 0x0 0x0
00000078: 3a1ffc0a 7f03fc00     st b32 a[0x74] $r2 0x0 0x0
00000080: 3c110d0c 08000001     sched 0x43 0x43 0x4 0x4f 0x0 0x0 0x0
00000088: 3c1ffc0e 7f03fc00     st b32 a[0x78] $r3 0x0 0x0
00000090: 3e1ffc12 7f03fc00     st b32 a[0x7c] $r4 0x0 0x0
00000098: 401ffc02 7f03fc00     st b32 a[0x80] $r0 0x0 0x0
000000a0: 001c003c 18000000     exit

000000a8: fc1c003c 12007fff  C  bra 0xa8
000000b0: 001c3c02 85800000     nop
000000b8: 001c3c02 85800000     nop

On Sat, May 23, 2015 at 5:35 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> On Thu, May 21, 2015 at 11:32 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> On Thu, May 21, 2015 at 10:05 AM, Robert Morell <rmorell at nvidia.com> wrote:
>>> Hi Ilia,
>>>
>>> On Sat, May 02, 2015 at 12:34:21PM -0400, Ilia Mirkin wrote:
>>>> Hi,
>>>>
>>>> As I'm looking to add some support to nouveau for features like atomic
>>>> counters and images, I'm running into some confusion about what the
>>>> first word of the shader header means. Here is the definition as we
>>>> have it today:
>>>
>>> [...]
>>>
>>>> However I know that these are somewhat wrong. I've seen shaders that
>>>> use gmem accesses (i.e. mov r0, [r0]) that just have the LMEM enable
>>>> bit set (and they use no lmem). And I've seen additional bits set, esp
>>>> relating to images, but I haven't spent enough time looking at all the
>>>> variations to make sense of it yet. For example, I think that Fermi
>>>> and Kepler+ have different meanings for some of the bits.
>>>
>>> Those look pretty close :)
>>>
>>>> I was hoping you could just release the docs for the shader headers,
>>>> or at least the first word of the shader header.
>>>
>>> We've posted the specification for the full Shader Program Header to our
>>> GPU documentation site here:
>>>
>>> ftp://download.nvidia.com/open-gpu-doc/Shader-Program-Header/1/Shader-Program-Header.html
>>>
>>> I hope it helps clear things up.
>>
>> Yep, just a few follow-up questions:
>>
>> - SPH Type 1 and type 2 appear to be flipped wrt the tables -- "When
>> PS is used, field SphType in CommonWord0 must be set to 1; similarly,
>> when VTG is used, SphType in CommonWord0 must be set to 2." But the
>> "Table 1. SPH Type 1 Definition" is clearly meant for VTG and table 2
>> is clearly meant for PS...
>> - You skip over SassVersion -- what is that?
>> - You have a funny note in there -- "Triangles generated by the
>> geometry shader always have all their edge flags set to TRUE" -- that
>> is the *only* reference to edge flags in the whole document. Right now
>> we do some crazy thing to get edge flags right on fermi+ (and I think
>> we just get them wrong on tesla). Is there a way to emit edge flags
>> from vertex shader?
>> - To be clear: DoesLoadOrStore -- *any* load/store? Even LDC? ALD?
>
> Oh, and one more little correction:
>
> """
> The SPH field OutputTopology sets the primitive topology of the
> vertices that are output from the pipe stage. This field is only used
> with geometry shaders, where the value must be greater than zero and
> has a maximum of 1024. The allowed values are: ... [the correct values
> for OutputTopology]
> """
>
> The 1024 thing seems like it probably applies to MaxOutputVertexCount
> in CommonWord4.
>
>   -ilia


More information about the Nouveau mailing list