[Bug 92096] New: regression/bug introduced by commit [0e572fe7383a376992364914694c39aa7fe44c1d] drm/i915: runtime PM support for DPMS

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Sep 23 19:07:37 PDT 2015


https://bugs.freedesktop.org/show_bug.cgi?id=92096

            Bug ID: 92096
           Summary: regression/bug introduced by commit
                    [0e572fe7383a376992364914694c39aa7fe44c1d] drm/i915:
                    runtime PM support for DPMS
           Product: DRI
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/Intel
          Assignee: intel-gfx-bugs at lists.freedesktop.org
          Reporter: jairo.daniel.miramontes.caton at intel.com
        QA Contact: intel-gfx-bugs at lists.freedesktop.org
                CC: intel-gfx-bugs at lists.freedesktop.org

> Hello,
> 
> A regression/bug was introduced by commit 
> [0e572fe7383a376992364914694c39aa7fe44c1d] drm/i915: runtime PM 
> support for DPMS.
> It causes my linux box to emit PATA and NIC errors (PCI bus stops 
> working), SATA is unaffected. The system freezes shorty after. Trouble 
> starts within
> 12 hours, but not right away.
> Git bisect was used to find the regression/bug. It was initially 
> introduced somewhere around kernel v3.15.0-rc8 (code has moved around since).
> 
> *** for details and logs, please scroll down ***
> 
> Hardware details:
> - MSI MS-7732/PH61A-P35 (MS-7732), BIOS V2.4 09/19/2014
> - Sandy Bridge Intel(R) Pentium(R) CPU G620
> - 8 GiB RAM
> - CRT monitor without EDID (not "plug and play") hooked up to the i915 
> CRT port
> - 1x SSD connected to ASM1062 SATA controller
> - 2x HDD connected to Intel SATA controller
> - CDROM drive (ata9) and HDD (ata10) connected to Promise PDC20268
> - Uncommon hardware configuration (old PCI PATA controller and PCI 
> NIC)
> 
> Other:
> - Running on LFS 7.1+ x86_64
> - DPMS powersave enabled (setterm -powersave powerdown -powerdown 5)
> - text based, no X
> - Has VM's (QEMU KVM) running on top
> 
> Testing revealed:
> v3.18.21: NOT OK
> v3.17.8 : NOT OK
> v3.17.0 : NOT OK
> v3.16.7 : OK
> v3.16.0 : OK
> v3.14.51: OK
> 
> As a test, function intel_crtc_update_dpms() in 
> v3.18.20/drivers/gpu/drm/i915/intel_display.c was replaced with an 
> older revision (see below). The older revision works fine; it bypasses 
> the function intel_crtc_control().
> 
> So what is going wrong here? Perhaps something to do with the power 
> domains, lack of EDID? Or might it just trigger some bug/regression 
> hidden deep in the kernel?
> 
> I do not have the know-how to fix this. You guys do, ofcourse! :-)
> 
> 
> /**
>  * Sets the power management mode of the pipe and plane.
>  */
> void intel_crtc_update_dpms(struct drm_crtc *crtc) {
>         struct drm_device *dev = crtc->dev;
>         struct drm_i915_private *dev_priv = dev->dev_private;
>         struct intel_encoder *intel_encoder;
>         bool enable = false;
> 
>         for_each_encoder_on_crtc(dev, crtc, intel_encoder)
>                 enable |= intel_encoder->connectors_active;
> 
>         if (enable)
>                 dev_priv->display.crtc_enable(crtc);
>         else
>                 dev_priv->display.crtc_disable(crtc);
> 
>         intel_crtc_update_sarea(crtc, enable); }
> 
> 
> # /proc/cmdline
> # noirqdebug: "ASM1083/1085 PCIe to PCI Bridge" has hardware issues, 
> skips a IRQ beat now and then root=LABEL=ROOTFS selinux=0 
> video=800x600-32 at 72 acpi_enforce_resources=lax
> thermal.off=1 noirqdebug
> 
> 
> # /proc/interrupts
>   0:         14          0   IO-APIC-edge      timer
>   1:         12          0   IO-APIC-edge      i8042
>   4:      57380          0   IO-APIC-edge      serial
>   5:          0          0   IO-APIC-edge      parport0
>   8:        119          0   IO-APIC-edge      rtc0
>   9:          3          0   IO-APIC-fasteoi   acpi
>  12:        147          0   IO-APIC-edge      i8042
>  16:        746          0   IO-APIC  16-fasteoi   ehci_hcd:usb1
>  18:       7305          0   IO-APIC  18-fasteoi   0000:03:00.0
>  19:      64979          0   IO-APIC  19-fasteoi   eth0
>  23:         33          0   IO-APIC  23-fasteoi   ehci_hcd:usb3
>  24:        762          0   PCI-MSI-edge      0000:00:1f.2
>  25:      12926          0   PCI-MSI-edge      0000:06:00.0
>  26:      16017          0   PCI-MSI-edge      eth1
>  27:          0          0   PCI-MSI-edge      xhci_hcd
>  28:          0          0   PCI-MSI-edge      xhci_hcd
>  29:          0          0   PCI-MSI-edge      xhci_hcd
>  30:        517          0   PCI-MSI-edge      snd_hda_intel
>  31:          7          0   PCI-MSI-edge      i915
> NMI:          0          0   Non-maskable interrupts
> LOC:    1024537     205875   Local timer interrupts
> SPU:          0          0   Spurious interrupts
> PMI:          0          0   Performance monitoring interrupts
> IWI:          0          0   IRQ work interrupts
> RTR:          1          0   APIC ICR read retries
> RES:      17909      21506   Rescheduling interrupts
> CAL:       7297       7219   Function call interrupts
> TLB:        346        209   TLB shootdowns
> TRM:          0          0   Thermal event interrupts
> THR:          0          0   Threshold APIC interrupts
> MCE:          0          0   Machine check exceptions
> MCP:         11         11   Machine check polls
> ERR:          0
> MIS:          0
> 
> 
> # lspci
> 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor 
> Family DRAM Controller (rev 09)
> 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation 
> Core Processor Family Integrated Graphics Controller (rev 09)
> 00:16.0 Communication controller: Intel Corporation 6 Series/C200 
> Series Chipset Family MEI Controller #1 (rev 04)
> 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset 
> Family USB Enhanced Host Controller #2 (rev 05)
> 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset 
> Family High Definition Audio Controller (rev 05)
> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset 
> Family PCI Express Root Port 1 (rev b5)
> 00:1c.2 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5)
> 00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset 
> Family PCI Express Root Port 4 (rev b5)
> 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset 
> Family PCI Express Root Port 5 (rev b5)
> 00:1c.5 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset 
> Family PCI Express Root Port 6 (rev b5)
> 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset 
> Family USB Enhanced Host Controller #1 (rev 05)
> 00:1f.0 ISA bridge: Intel Corporation H61 Express Chipset Family LPC 
> Controller (rev 05)
> 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series 
> Chipset Family SATA AHCI Controller (rev 05)
> 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family 
> SMBus Controller (rev 05)
> 02:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI 
> Bridge (rev 01)
> 03:00.0 Mass storage controller: Promise Technology, Inc. PDC20268 
> [Ultra100 TX2] (rev 02)
> 03:01.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] 
> (rev
> 74)
> 04:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB 
> Host Controller
> 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
> RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
> 06:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA 
> Controller (rev 01)
> 
> 
> # logging: so it begins
> Sep  8 04:13:06 syscore kernel: ata9: drained 2 bytes to clear DRQ Sep  
> 8 04:18:13 syscore kernel: ata9: drained 2 bytes to clear DRQ Sep  8 
> 04:19:55 syscore kernel: ata9: drained 2 bytes to clear DRQ Sep  8 
> 04:37:42 syscore kernel: ata9: drained 2 bytes to clear DRQ Sep  8 
> 04:41:58 syscore kernel: ata9: drained 2 bytes to clear DRQ Sep  8 
> 04:47:06 syscore kernel: ata9: drained 2 bytes to clear DRQ
> 
> 
> # logging: and it ends like this (kernel lock-up if unlucky) Jun 14 
> 18:32:59 syscore kernel: ata9: drained 2 bytes to clear DRQ Jun 14 
> 18:33:04 syscore kernel: ata9.00: qc timeout (cmd 0xa0) Jun 14 
> 18:33:04 syscore kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr
> 0x0 action 0x6 frozen
> Jun 14 18:33:16 syscore kernel: sr 8:0:0:0: CDB:
> Jun 14 18:33:16 syscore kernel: Get event status notification: 4a 01 
> 00 00
> 10 00 00 00 08 00
> Jun 14 18:33:16 syscore kernel: ata9.00: cmd
> a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in Jun 14 18:33:16 
> syscore kernel: res 51/54:03:00:08:00/00:00:00:00:00/a0
> Emask 0x5 (timeout)
> Jun 14 18:33:16 syscore kernel: ata9.00: status: { DRDY ERR } Jun 14 
> 18:33:16 syscore kernel: ata9: soft resetting link Jun 14 18:33:16 
> syscore kernel: eth0: Transmit error, Tx status register ff.
> Jun 14 18:33:16 syscore kernel: Flags; bus-master 1, dirty 51677(13) 
> current
> 51677(13)
> Jun 14 18:33:16 syscore kernel: Transmit list ffffffff vs. ffff8800d781e9b8.
> Jun 14 18:33:16 syscore kernel: 0: @ffff8800d781e200  length 8000008d 
> status 0001008d Jun 14 18:33:16 syscore kernel: 1: @ffff8800d781e298  
> length 80000048 status
> 00010048
> Jun 14 18:33:16 syscore kernel: 2: @ffff8800d781e330  length 800000b7 
> status
> 000100b7
> Jun 14 18:33:16 syscore kernel: 3: @ffff8800d781e3c8  length 800000a7 
> status
> 000100a7
> Jun 14 18:33:16 syscore kernel: 4: @ffff8800d781e460  length 00000080 
> status
> 00010102
> Jun 14 18:33:16 syscore kernel: 5: @ffff8800d781e4f8  length 80000042 
> status
> 00010042
> Jun 14 18:33:16 syscore kernel: 6: @ffff8800d781e590  length 80000042 
> status
> 00010042
> Jun 14 18:33:16 syscore kernel: 7: @ffff8800d781e628  length 80000042 
> status
> 00010042
> Jun 14 18:33:16 syscore kernel: 8: @ffff8800d781e6c0  length 80000042 
> status
> 00010042
> Jun 14 18:33:16 syscore kernel: 9: @ffff8800d781e758  length 80000042 
> status
> 00010042
> Jun 14 18:33:16 syscore kernel: 10: @ffff8800d781e7f0  length 00000080 
> status 0001014d Jun 14 18:33:16 syscore kernel: 11: @ffff8800d781e888  
> length 80000048 status 00010048 Jun 14 18:33:16 syscore kernel: 12: 
> @ffff8800d781e920  length 800000a7 status 800100a7 Jun 14 18:33:16 
> syscore kernel: 13: @ffff8800d781e9b8  length 80000042 status 00010042 
> Jun 14 18:33:16 syscore kernel: 14: @ffff8800d781ea50  length 80000042 
> status 00010042 Jun 14 18:33:16 syscore kernel: 15: @ffff8800d781eae8  
> length 00000080 status 00010102 Jun 14 18:33:16 syscore kernel: eth0: 
> Updating statistics failed, disabling stats as an interrupt source Jun 
> 14 18:33:16 syscore kernel: eth0: Host error, FIFO diagnostic register 
> ffff.
> Jun 14 18:33:16 syscore kernel: eth0: PCI bus error, bus status 
> ffffffff Jun 14 18:33:16 syscore kernel: eth0:  setting full-duplex.
> Jun 14 18:33:16 syscore kernel: eth0: command 0x5800 did not complete!
> Status=0xffff
> Jun 14 18:33:16 syscore kernel: sched: RT throttling activated Jun 14 
> 18:33:16 syscore kernel: ata9.00: qc timeout (cmd 0xa1) Jun 14 
> 18:33:16 syscore kernel: ata9.00: failed to IDENTIFY (I/O error,
> err_mask=0x5)
> Jun 14 18:33:16 syscore kernel: ata9.00: revalidation failed 
> (errno=-5) Jun 14 18:33:16 syscore kernel: ata9: soft resetting link 
> Jun 14 18:33:16 syscore kernel: pata_pdc2027x: 40-conductor cable 
> detected on port 0 Jun 14 18:33:16 syscore kernel: ata9.00: configured 
> for MWDMA2 Jun 14 18:33:16 syscore kernel: ata9: EH complete
> 
> 
> # logging: more of the same
> Sep 10 15:34:21 syscore kernel: ata10.00: exception Emask 0x0 SAct 0x0 
> SErr
> 0x0 action 0x0
> Sep 10 15:34:21 syscore kernel: ata10.00: BMDMA stat 0xff Sep 10 
> 15:34:21 syscore kernel: ata10.00: failed command: READ DMA EXT Sep 10 
> 15:34:21 syscore kernel: ata10.00: cmd
> 25/00:00:48:d6:fd/00:01:21:00:00/e0 tag 0 dma 131072 in Sep 10 
> 15:34:21 syscore kernel: res 50/00:00:47:d7:fd/00:00:21:00:00/e0
> Emask 0x20 (host bus error)
> Sep 10 15:34:21 syscore kernel: ata10.00: status: { DRDY } Sep 10 
> 15:34:22 syscore kernel: ata10.00: configured for UDMA/100 Sep 10 
> 15:34:22 syscore kernel: ata10: EH complete Sep 10 15:34:23 syscore 
> kernel: ata10: drained 202 bytes to clear DRQ Sep 10 15:34:23 syscore 
> kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr
> 0x0 action 0x0
> Sep 10 15:34:23 syscore kernel: ata10.00: BMDMA stat 0xff Sep 10 
> 15:34:23 syscore kernel: ata10.00: failed command: READ DMA EXT Sep 10 
> 15:34:23 syscore kernel: ata10.00: cmd
> 25/00:00:48:d6:fd/00:01:21:00:00/e0 tag 0 dma 131072 in Sep 10 
> 15:34:23 syscore kernel: res 50/00:00:47:d7:fd/00:00:21:00:00/e0
> Emask 0x20 (host bus error)
> Sep 10 15:34:23 syscore kernel: ata10.00: status: { DRDY } Sep 10 
> 15:34:23 syscore kernel: ata10.00: n_sectors mismatch 586114704 !=
> 268435455
> Sep 10 15:34:23 syscore kernel: ata10.00: old n_sectors matches 
> native, probably late HPA lock, will try to unlock HPA Sep 10 15:34:23 
> syscore kernel: ata10.00: revalidation failed (errno=-5) Sep 10 
> 15:34:23 syscore kernel: ata10: soft resetting link Sep 10 15:34:33 
> syscore kernel: eth0: Transmit error, Tx status register ff.
> Sep 10 15:34:33 syscore kernel: Flags; bus-master 1, dirty 36208(0) 
> current
> 36208(0)
> Sep 10 15:34:33 syscore kernel: Transmit list ffffffff vs. ffff88001fafc200.
> Sep 10 15:34:33 syscore kernel: 0: @ffff88001fafc200  length 80000071 
> status
> 00010071
> Sep 10 15:34:33 syscore kernel: 1: @ffff88001fafc298  length 80000036 
> status
> 00010036
> Sep 10 15:34:33 syscore kernel: 2: @ffff88001fafc330  length 8000005c 
> status 0001005c Sep 10 15:34:33 syscore kernel: 3: @ffff88001fafc3c8  
> length 80000036 status
> 00010036
> Sep 10 15:34:33 syscore kernel: 4: @ffff88001fafc460  length 80000036 
> status
> 00010036
> Sep 10 15:34:33 syscore kernel: 5: @ffff88001fafc4f8  length 80000064 
> status
> 00010064
> Sep 10 15:34:33 syscore kernel: 6: @ffff88001fafc590  length 8000002a 
> status 0001002a Sep 10 15:34:33 syscore kernel: 7: @ffff88001fafc628  
> length 80000064 status
> 00010064
> Sep 10 15:34:33 syscore kernel: 8: @ffff88001fafc6c0  length 80000036 
> status
> 00010036
> Sep 10 15:34:33 syscore kernel: 9: @ffff88001fafc758  length 80000064 
> status
> 00010064
> Sep 10 15:34:33 syscore kernel: 10: @ffff88001fafc7f0  length 80000055 
> status 00010055 Sep 10 15:34:33 syscore kernel: 11: @ffff88001fafc888  
> length 80000036 status 00010036 Sep 10 15:34:33 syscore kernel: 12: 
> @ffff88001fafc920  length 8000002a status 0001002a Sep 10 15:34:33 
> syscore kernel: 13: @ffff88001fafc9b8  length 8000004a status 0c01004a 
> Sep 10 15:34:33 syscore kernel: 14: @ffff88001fafca50  length 8000004a 
> status 0c01004a Sep 10 15:34:33 syscore kernel: 15: @ffff88001fafcae8  
> length 8000004a status 8c01004a Sep 10 15:34:33 syscore kernel: eth0: 
> Updating statistics failed, disabling stats as an interrupt source.
> Sep 10 15:34:33 syscore kernel: eth0: Host error, FIFO diagnostic 
> register ffff.
> Sep 10 15:34:33 syscore kernel: eth0: PCI bus error, bus status 
> ffffffff Sep 10 15:34:34 syscore kernel: eth0:  setting full-duplex.
> Sep 10 15:34:34 syscore kernel: eth0: command 0x5800 did not complete!
> Status=0xffff
> Sep 10 15:34:34 syscore kernel: [sched_delayed] sched: RT throttling 
> activated Sep 10 15:34:34 syscore kernel: ata10.00: configured for 
> UDMA/100 Sep 10 15:34:34 syscore kernel: ata10: EH complete Sep 10 
> 15:34:34 syscore kernel: ata10: drained 292 bytes to clear DRQ Sep 10 
> 15:34:34 syscore kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr
> 0x0 action 0x0
> Sep 10 15:34:34 syscore kernel: ata10.00: BMDMA stat 0xff Sep 10 
> 15:34:34 syscore kernel: ata10.00: failed command: READ DMA EXT Sep 10 
> 15:34:34 syscore kernel: ata10.00: cmd
> 25/00:00:48:d6:fd/00:01:21:00:00/e0 tag 0 dma 131072 in Sep 10 
> 15:34:34 syscore kernel: res 50/00:00:47:d7:fd/00:00:21:00:00/e0
> Emask 0x20 (host bus error)
> Sep 10 15:34:34 syscore kernel: ata10.00: status: { DRDY } Sep 10 
> 15:34:34 syscore kernel: ata10.00: configured for UDMA/100 Sep 10 
> 15:34:34 syscore kernel: ata10: EH complete
> 
> 
> Kind regards,
> Frank de Jong

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20150924/eaeb01b9/attachment-0001.html>


More information about the intel-gfx-bugs mailing list