[Nouveau] GPU MMU faults (was Meaning of the engines in paramaters of nouveau module)
Paul Dufresne
dufresnep at zoho.com
Tue Dec 5 14:27:55 UTC 2023
---- Le lun., 04 déc. 2023 22:27:49 -0500 Dave Airlie a écrit ----
> On Mon, 4 Dec 2023 at 05:04, Paul Dufresne dufresnep at zoho.com> wrote:
> >
> > In https://nouveau.freedesktop.org/KernelModuleParameters.html, there is:
> > Here is a list of engines:
> > DEVICE
> > DMAOBJ
...
> > PVP
> > SW
> > Also, in debug:
> > CLIENT
> >
...
> > Also, my interest is linked to the state of GPU graph given after a context switch timeout that looks like:
> > [ 1696.780305] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
> > [ 1696.780361] nouveau 0000:01:00.0: fifo:000000:00[ gr]: 8006e005: busy 1 faulted 0 chsw 1 save 1 load 1 chid 5*-> chid 6
> > [ 1696.780422] nouveau 0000:01:00.0: fifo:000000:07[ ce2]: 00050005: busy 0 faulted 0 chsw 0 save 0 load 0 chid 5 -> chid 5
> > [ 1696.780476] nouveau 0000:01:00.0: fifo:000004:04[ ce0]: 00000000: busy 0 faulted 0 chsw 0 save 0 load 0 chid 0 -> chid 0
> > [ 1696.780529] nouveau 0000:01:00.0: fifo:000001:01[ mspdec]: 00000000: busy 0 faulted 0 chsw 0 save 0 load 0 chid 0 -> chid 0
> > [ 1696.780581] nouveau 0000:01:00.0: fifo:000002:02[ msppp]: 00000000: busy 0 faulted 0 chsw 0 save 0 load 0 chid 0 -> chid 0
> > [ 1696.780633] nouveau 0000:01:00.0: fifo:000003:03[ msvld]: 00000000: busy 0 faulted 0 chsw 0 save 0 load 0 chid 0 -> chid 0
> > [ 1696.780689] nouveau 0000:01:00.0: fifo:000000:00[ gr]: 8006e005: busy 1 faulted 0 chsw 1 save 1 load 1 chid 5*-> chid 6
> > [ 1696.780744] nouveau 0000:01:00.0: fifo:000000:00[ gr]: 8006e005: busy 1 faulted 0 chsw 1 save 1 load 1 chid 5*-> chid 6
> > [ 1696.780795] nouveau 0000:01:00.0: fifo:000000:00[ gr]: triggering mmu fault on 0x00
> > [ 1696.780835] nouveau 0000:01:00.0: fifo:000000:07[ ce2]: 00050005: busy 0 faulted 0 chsw 0 save 0 load 0 chid 5 -> chid 5
> > [ 1696.780942] nouveau 0000:01:00.0: fifo:000000:00[ gr]: 00000100: mmu fault triggered
> > [ 1696.780987] nouveau 0000:01:00.0: fifo:000000:00[ gr]: c006e005: busy 1 faulted 1 chsw 1 save 1 load 1 chid 5*-> chid 6
> > [ 1696.781040] nouveau 0000:01:00.0: fifo:000000:0005:[Renderer[13701]] rc scheduled
> >
> > where I suspect ce2, is linked to PCE2.
> >
> > Is there a documentation that describes those "engines"?
>
> CE is copy engine.
> But this looks like an mmu fault on the GPU side, so some shader is
> doing something wrong most likely.
>
> Dave.
>
Sometimes the GPU mmu fault is on a gr engine, sometimes on ce2 engine.
But the driver is stable when using nouveau.noaccel=1 (not seen other kind of errors too, like deadlock detections when using noaccel=1).
Looking at the code, I begin to think that noaccel=0 allows for user-side channel creation, and so create the need for context switching... not sure.
More information about the Nouveau
mailing list