[Nouveau] GPU MMU faults (was Meaning of the engines in paramaters of nouveau module)

Paul Dufresne dufresnep at zoho.com
Tue Dec 5 14:27:55 UTC 2023


---- Le lun., 04 déc. 2023 22:27:49 -0500 Dave Airlie  a écrit ----

 > On Mon, 4 Dec 2023 at 05:04, Paul Dufresne dufresnep at zoho.com> wrote: 
 > > 
 > > In https://nouveau.freedesktop.org/KernelModuleParameters.html, there is: 
 > > Here is a list of engines: 
 > >     DEVICE 
 > >     DMAOBJ 
...
 > >     PVP 
 > >     SW 
 > > Also, in debug: 
 > >    CLIENT 
 > > 
...
 > > Also, my interest is linked to the state of GPU graph given after a context switch timeout that looks like: 
 > > [ 1696.780305] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] 
 > > [ 1696.780361] nouveau 0000:01:00.0: fifo:000000:00[      gr]: 8006e005: busy 1 faulted 0 chsw 1 save 1 load 1 chid 5*-> chid 6 
 > > [ 1696.780422] nouveau 0000:01:00.0: fifo:000000:07[     ce2]: 00050005: busy 0 faulted 0 chsw 0 save 0 load 0 chid 5 -> chid 5 
 > > [ 1696.780476] nouveau 0000:01:00.0: fifo:000004:04[     ce0]: 00000000: busy 0 faulted 0 chsw 0 save 0 load 0 chid 0 -> chid 0 
 > > [ 1696.780529] nouveau 0000:01:00.0: fifo:000001:01[  mspdec]: 00000000: busy 0 faulted 0 chsw 0 save 0 load 0 chid 0 -> chid 0 
 > > [ 1696.780581] nouveau 0000:01:00.0: fifo:000002:02[   msppp]: 00000000: busy 0 faulted 0 chsw 0 save 0 load 0 chid 0 -> chid 0 
 > > [ 1696.780633] nouveau 0000:01:00.0: fifo:000003:03[   msvld]: 00000000: busy 0 faulted 0 chsw 0 save 0 load 0 chid 0 -> chid 0 
 > > [ 1696.780689] nouveau 0000:01:00.0: fifo:000000:00[      gr]: 8006e005: busy 1 faulted 0 chsw 1 save 1 load 1 chid 5*-> chid 6 
 > > [ 1696.780744] nouveau 0000:01:00.0: fifo:000000:00[      gr]: 8006e005: busy 1 faulted 0 chsw 1 save 1 load 1 chid 5*-> chid 6 
 > > [ 1696.780795] nouveau 0000:01:00.0: fifo:000000:00[      gr]: triggering mmu fault on 0x00 
 > > [ 1696.780835] nouveau 0000:01:00.0: fifo:000000:07[     ce2]: 00050005: busy 0 faulted 0 chsw 0 save 0 load 0 chid 5 -> chid 5 
 > > [ 1696.780942] nouveau 0000:01:00.0: fifo:000000:00[      gr]: 00000100: mmu fault triggered 
 > > [ 1696.780987] nouveau 0000:01:00.0: fifo:000000:00[      gr]: c006e005: busy 1 faulted 1 chsw 1 save 1 load 1 chid 5*-> chid 6 
 > > [ 1696.781040] nouveau 0000:01:00.0: fifo:000000:0005:[Renderer[13701]] rc scheduled 
 > > 
 > > where I suspect ce2, is linked to PCE2. 
 > > 
 > > Is there a documentation that describes those "engines"? 
 >  
 > CE is copy engine. 
 > But this looks like an mmu fault on the GPU side, so some shader is 
 > doing something wrong most likely. 
 >  
 > Dave. 
 > 

Sometimes the GPU mmu fault is on a gr engine, sometimes on ce2 engine.
But the driver is stable when using nouveau.noaccel=1 (not seen other kind of errors too, like deadlock detections when using noaccel=1).

Looking at the code, I begin to think that noaccel=0 allows for user-side channel creation, and so create the need for context switching... not sure.



More information about the Nouveau mailing list