[Nouveau] Tesla shader ISA question

Wed Apr 9 12:41:30 PDT 2014

On Wed, Apr 9, 2014 at 3:30 PM, Andy Ritger <aritger at nvidia.com> wrote:
> Hi Ilia, sorry for the slow response.
>
> This isn't my area of expertise, but as I understand it:
>
> * You've correctly decoded both instructions:
>    * The first is a float32-to-float32 conversion, applying ceil()
>    * The second is a float32-to-signed-int32 conversion, rounding to
>      the nearest even integer
>
> * For the {float,int}-to-{float,int} operations, bit 48 indicates
>  whether the input is signed (1==signed), but that bit is ignored when
>  the source format is float (always signed). That should apply across
>  the entire Tesla architecture.
>
> Where did the Tesla shader come from that you were decoding? I assume
> it was produced by the NVIDIA proprietary driver?

Yes, it came from looking at how the ARB_texture_cube_map_array stuff
was implemented for GT21x (it has that texprep instruction that munges
the tex args). When comparing to the nouveau-generated code, I noticed
that we limit the array index to 511 "manually", whereas the NVIDIA
proprietary driver did not. But it was setting that unknown bit. I was
hoping that the unknown bit was some "limit to 512" magic. Perhaps we
can just get rid of the clamping, I'm sure that the spec would frown
on trying to access out-of-bounds layers...

>
> If I had to guess, I'd speculate that the shader compiler in the NVIDIA
> proprietary driver used an uninitialized variable, and then overwrote the
> bits that mattered for the opcode it was producing, leaving uninitialized
> data in the unused bits.

Makes sense, thanks so much for looking into it!

>
> I hope that helps,
> - Andy Ritger
>
>
> On Thu, Feb 27, 2014 at 11:37:40PM -0800, Ilia Mirkin wrote:
>> Hello,
>>
>> I've recently run into an unknown bit in Tesla shaders, and was hoping
>> you could shed some light on it. I believe they're related to clamping
>> of some sort. Here are 2 examples (from diff shaders):
>>
>> a0000401 cc054780     cvt rpi f32 $r0 f32 $r2 [unknown: 00000000 00010000]
>> a000060d 8c014780     cvt rni s32 $r3 f32 $r3 [unknown: 00000000 00010000]
>>
>> [This is intel-style syntax, cvt = convert/move, rni/rpi = rounding
>> mode stuff, hope that clears up the syntax...]
>>
>> The destination register tends to go to a texture-related instruction
>> input, in some cases the layer (which is why I suspect clamping). Both
>> of these were seen on shaders compiled for GT215+ chips. What effect
>> does turning it on have exactly? Also, is this bit available on
>> earlier chips (if so, how early)?
>>
>> Thanks,
>>
>>   -ilia