[Nouveau] Debugging INVALID_OPCODE / MULTIPLE_WARP_ERRORS ?

Fri Dec 18 04:55:47 PST 2015

Hi,

On 16-12-15 18:24, Ilia Mirkin wrote:
> I believe that your problem is this:
>
>          /*01a0*/                   LD R8, [R8];
>             /* 0x8000000000821c85 */
>
> That needs to be LD.E (and your ST's need to be ST.E). You're using a
> 32-bit gmem address, but you need to be using a 64-bit one. I believe
> the 32-bit ones work on fermi, but afaik not on Kepler.

I do not think that is the problem, src/gallium/tests/trivial/compute
test_input_global() has:

COMP
DCL SV[0], THREAD_ID
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
IMM[0] UINT32 {8, 0, 0, 0}
   0: BGNSUB :0
   1:   UMUL TEMP[0], SV[0], IMM[0]
   2:   LOAD TEMP[1].xy, RES[32764], TEMP[0]
   3:   LOAD TEMP[0].x, RES[32767], TEMP[1].yyyy
   4:   UADD TEMP[1].x, TEMP[0], -TEMP[1]
   5:   STORE RES[32767].x, TEMP[1].yyyy, TEMP[1]
   6:   RET
   7: ENDSUB

Which translates to:

SUB:0 ()
BB:0 (7 instructions) - df = { }
  -> BB:1 (cross)
   0: rdsv u32 $r0 sv[TID:0] (8)
   1: shl u32 $r2 $r0 0x00000003 (8)
   2: ld u64 $r0d c0[$r2+0x0] (8)
   3: ld u32 $r2 g[$r1+0x0] (8)
   4: add u32 $r0 $r2 neg $r0 (8)
   5: st u32 # g[$r1+0x0] $r0 (8)
   6: ret (8)
BB:1 (0 instructions) - idom = BB:0, df = { }

MAIN:-1 ()
BB:0 (0 instructions) - df = { }

Which is also using 32 bits loads from global memory
and that works fine on my GK107 [GeForce GT 740].

I think that for now I'll just focus on translating
the tests from rc/gallium/tests/trivial/compute.c to
opencl and getting the entire opencl -> llvm -> tgsi ->
nouveau_compiler -> hardware chain to work that way.

Still would be good to get nbody.c to work though.

Regards,

Hans

>
> Cheers,
>
>    -ilia
>
>
>
> On Wed, Dec 16, 2015 at 12:06 PM, Hans de Goede <hdegoede at redhat.com> wrote:
>> Hi,
>>
>> On 15-12-15 20:04, Ilia Mirkin wrote:
>>>
>>> Also, where's the exit op? Perhaps what's happening is that you don't
>>> have an exit and it just goes off executing into the ether?
>>
>>
>> Sorry I only included a small bit of the program in my original mail
>> because I found the use of "MOV" instructions to load constants
>> suspicious, is that normal ?
>>
>> I've put a log with NV50_PROG_DEBUG=1 output here:
>>
>> https://fedorapeople.org/~jwrdegoede/nbody.log
>>
>> nvdisasm -b SM30 for the generated binary code is here:
>>
>> https://fedorapeople.org/~jwrdegoede/nbody.disasm
>>
>> There are already .tgsi, .hex and .bin files there if
>> you find those easier to use then the
>> NV50_PROG_DEBUG=1 output.
>>
>>
>>>
>>> On Tue, Dec 15, 2015 at 12:00 PM, Ilia Mirkin <imirkin at alum.mit.edu>
>>> wrote:
>>>>
>>>> A few things that stand out:
>>>>
>>>>     0: ld u32 %r219 c0[0x0000000000000000+0x0] (0)
>>>>
>>>> wtf is that 0x0000000000000 thing doing there? Was it a %rX which got
>>>> constant-folded into 0? That indirectness should have then been
>>>> removed... that said, the final encoding looks fine.
>>
>>
>> I don't know, maybe there is a hint in the log file?
>>
>> Regards,
>>
>> Hans
>>
>>
>>
>>>>
>>>> I believe that kepler has this launch descriptor thing too... is that
>>>> being set correctly? Please generate a mmt trace, and we can see if
>>>> anything stands out compared to a blob trace that also does compute.
>>>>
>>>> Cheers,
>>>>
>>>>     -ilia
>>>>
>>>> On Tue, Dec 15, 2015 at 9:15 AM, Hans de Goede <hdegoede at redhat.com>
>>>> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> As part of my compute work I'm trying to get some TGSI compute
>>>>> code to work. The code from mesa/src/gallium/tests/trivial.c
>>>>> works.
>>>>>
>>>>> So now I'm trying to get a "native" tgsi kernel to run via
>>>>> clover, I'm using Francisco's nbody.c example for this:
>>>>>
>>>>> https://fedorapeople.org/~jwrdegoede/nbody.c
>>>>>
>>>>> Which does not work, at first I thought there was an issue
>>>>> with the setup of the input / output buffers, but that seems to
>>>>> work fine, and moreover I finally got the smart idea to look
>>>>> in dmesg, which says:
>>>>>
>>>>> [ 9920.802435] nouveau 0000:01:00.0: gr: TRAP ch 6 [007f7fa000
>>>>> nbody[31881]]
>>>>> [ 9920.802449] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global
>>>>> 00000000
>>>>> [] warp 10009 [INVALID_OPCODE]
>>>>> [ 9920.802456] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global
>>>>> 00000004
>>>>> [MULTIPLE_WARP_ERRORS] warp 20009 [INVALID_OPCODE]
>>>>>
>>>>> and repeats that for every "step" in the nobody simulation, this is on a
>>>>> gk107 card.
>>>>>
>>>>> So that seems to be the real problem, since the
>>>>> error says "INVALID_OPCODE", I've put the tgsi code from nbody.c
>>>>> through "nouveau_compiler -a e4" and then run "nvdisasm -b SM30"
>>>>> on it, but the output looks ok. There is a 8 byte sequence which does
>>>>> not get decoded every 64 bytes but AFAIK that is the scheduling info,
>>>>> so that should be fine.
>>>>>
>>>>> One thing which does stand out is that this:
>>>>>
>>>>>     0: ld u32 %r219 c0[0x0000000000000000+0x0] (0)
>>>>>     1: ld u32 %r222 c0[0x4] (0)
>>>>>     2: ld u64 { %r225 %r228 } c0[0x8] (0)
>>>>>     3: ld u32 %r234 c0[0x10] (0)
>>>>>
>>>>> Gets translated into (nvdisasm output) :
>>>>>
>>>>>           /*0008*/                   LDC R4, c[0x0][0x0];
>>>>> /* 0x1400000003f11c86 */
>>>>>           /*0010*/                   MOV R2, c[0x0][0x4];
>>>>> /* 0x2800400010009de4 */
>>>>>           /*0018*/                   LDC.64 R0, c[0x0][0x8];
>>>>> /* 0x1400000023f01ca6 */
>>>>>           /*0020*/                   MOV R3, c[0x0][0x10];
>>>>> /* 0x280040004000dde4 */
>>>>>
>>>>> Where I would expect for LDC instructions, could that be the problem ?
>>>>>
>>>>> If that is not the problem, then hints how to debug this further would
>>>>> be
>>>>> greatly appreciated.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Hans
>>>>> _______________________________________________
>>>>> Nouveau mailing list
>>>>> Nouveau at lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/nouveau