[Nouveau] Bug: noveau DATA_ERROR / CACHE_ERROR on Quadro NVS 290

Clemens Koller clemens.ml at gmx.net
Fri Apr 11 09:36:51 PDT 2014


Hi, there!

Every once in a while / about once a day I have nouveau for a Quadro NVS
290 failing in my
system from about kernel 3.10...up to now 3.14, so I finally decided to
report this bug as
it gets really annoying. After the bug appears, there are some (one per
DATA_ERROR line)
small 20x20 to 40x40 pixel sized odd shaped white block artefacts stuck
on my (dual monitor)
desktop. After a restart of X, the artefacts disappear until the bug
triggers again.

I am on a current Arch Linux Distro. The motherboard is from an
industrial system which is
otherwise running fine and very stable.

Here are some log outputs:

% uname -a
Linux octo 3.14.0-4-ARCH #1 SMP PREEMPT Wed Apr 9 21:11:25 CEST 2014
x86_64 GNU/Linux

$ dmesg
...
[22616.270000] nouveau E[   PFIFO][0000:01:00.0] DMA_PUSHER - ch 2
[X[632]] get 0x0020029b08 put 0x0020029e60 ib_get 0x000003bd ib_put
0x000003d7 state 0x8000e6a8 (err: INVALID_CMD) push 0x00406040
[22616.270226] nouveau E[  PGRAPH][0000:01:00.0] DATA_ERROR BEGIN_END_ACTIVE
[22616.270232] nouveau E[  PGRAPH][0000:01:00.0] ch 2 [0x000fb33000
X[632]] subc 7 class 0x8297 mthd 0x1360 data 0x00000001
[22616.270260] nouveau E[  PGRAPH][0000:01:00.0] DATA_ERROR BEGIN_END_ACTIVE
[22616.270265] nouveau E[  PGRAPH][0000:01:00.0] ch 2 [0x000fb33000
X[632]] subc 7 class 0x8297 mthd 0x1340 data 0x00008006
[22616.270280] nouveau E[  PGRAPH][0000:01:00.0] DATA_ERROR BEGIN_END_ACTIVE
[22616.270284] nouveau E[  PGRAPH][0000:01:00.0] ch 2 [0x000fb33000
X[632]] subc 7 class 0x8297 mthd 0x1344 data 0x00004001
[22616.270298] nouveau E[  PGRAPH][0000:01:00.0] DATA_ERROR BEGIN_END_ACTIVE
[22616.270302] nouveau E[  PGRAPH][0000:01:00.0] ch 2 [0x000fb33000
X[632]] subc 7 class 0x8297 mthd 0x1348 data 0x00004303
[22616.270316] nouveau E[  PGRAPH][0000:01:00.0] DATA_ERROR BEGIN_END_ACTIVE
[22616.270321] nouveau E[  PGRAPH][0000:01:00.0] ch 2 [0x000fb33000
X[632]] subc 7 class 0x8297 mthd 0x134c data 0x00008006
[22616.270335] nouveau E[  PGRAPH][0000:01:00.0] DATA_ERROR BEGIN_END_ACTIVE
[22616.270340] nouveau E[  PGRAPH][0000:01:00.0] ch 2 [0x000fb33000
X[632]] subc 7 class 0x8297 mthd 0x1350 data 0x00004001
[22616.270352] nouveau E[  PGRAPH][0000:01:00.0] DATA_ERROR BEGIN_END_ACTIVE
[22616.270356] nouveau E[  PGRAPH][0000:01:00.0] ch 2 [0x000fb33000
X[632]] subc 7 class 0x8297 mthd 0x1358 data 0x00004303
[22642.053387] nouveau E[   PFIFO][0000:01:00.0] DMA_PUSHER - ch 2
[X[632]] get 0x002003e70c put 0x002003eb98 ib_get 0x00000278 ib_put
0x00000385 state 0xc000ef05 (err: MEM_FAULT) push 0x00406040
[22642.053426] nouveau E[     PFB][0000:01:00.0] trapped read at
0xfffffffffc on channel 0x0000fcb0 [unknown] PFIFO/PFIFO_READ/PUSHBUF
reason: PT_NOT_PRESENT
[22642.055251] nouveau E[  PGRAPH][0000:01:00.0] DATA_ERROR (unknown
enum 0x00000034)
[22642.055258] nouveau E[  PGRAPH][0000:01:00.0] ch 2 [0x000fb33000
X[632]] subc 2 class 0x502d mthd 0x08dc data 0x00000040
[22652.695809] nouveau E[   PFIFO][0000:01:00.0] DMA_PUSHER - ch 2
[X[632]] get 0x002000f840 put 0x002000f860 ib_get 0x0000039a ib_put
0x000003ac state 0x80004610 (err: INVALID_CMD) push 0x00406040
[22740.413503] nouveau E[   PFIFO][0000:01:00.0] CACHE_ERROR - ch 2
[X[632]] subc 0 mthd 0x0060 data 0xbeef0201
[22775.303885] nouveau E[   PFIFO][0000:01:00.0] CACHE_ERROR - ch 2
[X[632]] subc 0 mthd 0x0060 data 0xbeef0201



$ lspci
00:00.0 Host bridge: Intel Corporation Core Processor DMI (rev 11)
00:03.0 PCI bridge: Intel Corporation Core Processor PCI Express Root
Port 1 (rev 11)
00:08.0 System peripheral: Intel Corporation Core Processor System
Management Registers (rev 11)
00:08.1 System peripheral: Intel Corporation Core Processor Semaphore
and Scratchpad Registers (rev 11)
00:08.2 System peripheral: Intel Corporation Core Processor System
Control and Status Registers (rev 11)
00:08.3 System peripheral: Intel Corporation Core Processor
Miscellaneous Registers (rev 11)
00:10.0 System peripheral: Intel Corporation Core Processor QPI Link
(rev 11)
00:10.1 System peripheral: Intel Corporation Core Processor QPI Routing
and Protocol Registers (rev 11)
00:16.0 Communication controller: Intel Corporation 5 Series/3400 Series
Chipset HECI Controller (rev 06)
00:16.2 IDE interface: Intel Corporation 5 Series/3400 Series Chipset PT
IDER Controller (rev 06)
00:16.3 Serial controller: Intel Corporation 5 Series/3400 Series
Chipset KT Controller (rev 06)
00:19.0 Ethernet controller: Intel Corporation 82578DM Gigabit Network
Connection (rev 06)
00:1a.0 USB controller: Intel Corporation 5 Series/3400 Series Chipset
USB2 Enhanced Host Controller (rev 06)
00:1b.0 Audio device: Intel Corporation 5 Series/3400 Series Chipset
High Definition Audio (rev 06)
00:1c.0 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI
Express Root Port 1 (rev 06)
00:1c.6 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI
Express Root Port 7 (rev 06)
00:1d.0 USB controller: Intel Corporation 5 Series/3400 Series Chipset
USB2 Enhanced Host Controller (rev 06)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a6)
00:1f.0 ISA bridge: Intel Corporation 5 Series Chipset LPC Interface
Controller (rev 06)
00:1f.2 SATA controller: Intel Corporation 5 Series/3400 Series Chipset
6 port SATA AHCI Controller (rev 06)
00:1f.3 SMBus: Intel Corporation 5 Series/3400 Series Chipset SMBus
Controller (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation G86 [Quadro NVS
290] (rev a1)
03:00.0 Ethernet controller: Intel Corporation 82583V Gigabit Network
Connection
04:0c.0 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 61)
04:0c.1 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 61)
04:0c.2 USB controller: VIA Technologies, Inc. USB 2.0 (rev 63)
04:0c.3 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire
II(M)] IEEE 1394 OHCI Controller (rev 46)
04:0e.0 Mass storage controller: Promise Technology, Inc. PDC40775 (SATA
300 TX2plus) (rev 02)
ff:00.0 Host bridge: Intel Corporation Core Processor QuickPath
Architecture Generic Non-Core Registers (rev 04)
ff:00.1 Host bridge: Intel Corporation Core Processor QuickPath
Architecture System Address Decoder (rev 04)
ff:02.0 Host bridge: Intel Corporation Core Processor QPI Link 0 (rev 04)
ff:02.1 Host bridge: Intel Corporation Core Processor QPI Physical 0
(rev 04)
ff:03.0 Host bridge: Intel Corporation Core Processor Integrated Memory
Controller (rev 04)
ff:03.1 Host bridge: Intel Corporation Core Processor Integrated Memory
Controller Target Address Decoder (rev 04)
ff:03.4 Host bridge: Intel Corporation Core Processor Integrated Memory
Controller Test Registers (rev 04)
ff:04.0 Host bridge: Intel Corporation Core Processor Integrated Memory
Controller Channel 0 Control Registers (rev 04)
ff:04.1 Host bridge: Intel Corporation Core Processor Integrated Memory
Controller Channel 0 Address Registers (rev 04)
ff:04.2 Host bridge: Intel Corporation Core Processor Integrated Memory
Controller Channel 0 Rank Registers (rev 04)
ff:04.3 Host bridge: Intel Corporation Core Processor Integrated Memory
Controller Channel 0 Thermal Control Registers (rev 04)
ff:05.0 Host bridge: Intel Corporation Core Processor Integrated Memory
Controller Channel 1 Control Registers (rev 04)
ff:05.1 Host bridge: Intel Corporation Core Processor Integrated Memory
Controller Channel 1 Address Registers (rev 04)
ff:05.2 Host bridge: Intel Corporation Core Processor Integrated Memory
Controller Channel 1 Rank Registers (rev 04)
ff:05.3 Host bridge: Intel Corporation Core Processor Integrated Memory
Controller Channel 1 Thermal Control Registers (rev 04)

$ lspci -vvv (just the snippet from nvidia card:)

01:00.0 VGA compatible controller: NVIDIA Corporation G86 [Quadro NVS
290] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: NVIDIA Corporation Device 0492
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 51
	Region 0: Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at f0000000 (64-bit, prefetchable) [size=64M]
	Region 3: Memory at f8000000 (64-bit, non-prefetchable) [size=32M]
	Region 5: I/O ports at cc00 [size=128]
	Expansion ROM at fbde0000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: nouveau
	Kernel modules: nouveau

Cooling seems to be fine:

$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +39.0°C  (high = +83.0°C, crit = +99.0°C)
Core 1:       +38.0°C  (high = +83.0°C, crit = +99.0°C)
Core 2:       +44.0°C  (high = +83.0°C, crit = +99.0°C)
Core 3:       +38.0°C  (high = +83.0°C, crit = +99.0°C)

nouveau-pci-0100
Adapter: PCI adapter
temp1:        +65.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +115.0°C, hyst =  +2.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)


Is this a hardware bug or some driver issue?
Any hints are welcome.

I am able to patch, compile and test a custom kernel (latest git)
if its of any use.

Regards,

Clemens

-- 
Embeon Systemdesign und Elektronik
http://www.embeon.de
---


More information about the Nouveau mailing list