[Nouveau] What are the restrictions around loading indirect constbuf values

Thu Jun 25 07:41:53 PDT 2015

Hello,

We recently tracked down a bug on Tesla GPUs (i.e. G80-GT218) whereby
it appears that instructions like

00000028: b5000409 08000780     add rn f32 $r2 $r2 neg c0[$a1]
00000040: b500060d 08004780     add rn f32 $r3 $r3 neg c0[$a1+0x4]

or with nvdisasm:

        .headerflags    @"EF_CUDA_SM12 EF_CUDA_PTX_SM(EF_CUDA_SM12)"
        /*0000*/         FADD R2, R2, -c[0x0][A1+0x0];  /* 0x08000780b5000409 */
        /*0008*/         FADD R3, R3, -c[0x0][A1+0x1];  /* 0x08004780b500060d */

don't appear to execute properly. However just MOV'ing the values into
registers works fine. This was observed on a G92 chip. See bug
https://bugs.freedesktop.org/show_bug.cgi?id=91056.

I was hoping you could save me some time and let me know what
instructions can load things like c0[$a1+4] (or maybe it's only in
combination with the modifier?), and which Tesla-family GPU's have
those restrictions.

Thanks,

  -ilia