[Intel-gfx] [regression] 3.6-rc6, gpu hang with vaapi

Guilherme M. Schroeder guialemas at gmail.com
Thu Sep 20 11:58:56 CEST 2012


Hi,

I get a GPU hang every time i play an mkv using mplayer -vo vaapi with 3.6-rc6.
With 3.5.4 no GPU hang at all.

I'm using xf86-video-intel 2.20.8, xorg-server 1.12.4 and libva-1.1.0
on Intel(R) Sandybridge Mobile (GT2+).

On dmesg, i915_error_state and Xorg.0.log:

[  105.822821] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
elapsed... GPU hung
[  105.822825] [drm] capturing error event; look for more information
in /debug/dri/0/i915_error_state
[  111.871670] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
elapsed... GPU hung
[  112.950746] ------------[ cut here ]------------
[  112.950751] WARNING: at mm/page_alloc.c:2349
__alloc_pages_nodemask+0x87c/0x960()
[  112.950753] Hardware name: 4177CTO
[  112.950753] Modules linked in: fuse ip6table_filter ip6_tables
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp
iptable_filter ip_tables x_tables bridge stp llc snd_hda_codec_hdmi
arc4 vhost_net tun macvtap macvlan snd_hda_codec_conexant iwldvm btusb
joydev mac80211 kvm_intel iwlwifi i915 bluetooth intel_agp intel_gtt
i2c_algo_bit drm_kms_helper cfg80211 drm thinkpad_acpi nvram rfkill
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc
crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 snd_timer
e1000e i2c_i801 aes_generic ablk_helper snd cryptd iTCO_wdt
iTCO_vendor_support soundcore lpc_ich i2c_core tpm_tis sdhci_pci sdhci
mei mmc_core psmouse serio_raw kvm acpi_cpufreq battery video button
thermal
[  112.950785]  mperf mfd_core ac pcspkr evdev wmi tpm processor
tpm_bios coretemp microcode autofs4 ext4 crc16 jbd2 mbcache sd_mod
ahci ehci_hcd libahci libata scsi_mod usbcore usb_common
[  112.950795] Pid: 1461, comm: cp Not tainted 3.6.0-rc6-00052-gc46de22 #33
[  112.950796] Call Trace:
[  112.950801]  [<ffffffff81055bff>] warn_slowpath_common+0x7f/0xc0
[  112.950803]  [<ffffffff81055c5a>] warn_slowpath_null+0x1a/0x20
[  112.950804]  [<ffffffff81124ffc>] __alloc_pages_nodemask+0x87c/0x960
[  112.950807]  [<ffffffff8115f6e0>] alloc_pages_current+0xb0/0x120
[  112.950809]  [<ffffffff81120b8e>] __get_free_pages+0xe/0x50
[  112.950811]  [<ffffffff81166329>] kmalloc_order_trace+0x39/0xf0
[  112.950812]  [<ffffffff81168d17>] __kmalloc+0x177/0x180
[  112.950815]  [<ffffffff811297cc>] ? put_page+0x2c/0x40
[  112.950818]  [<ffffffff811988de>] seq_read+0x10e/0x3b0
[  112.950821]  [<ffffffff81176a75>] vfs_read+0xa5/0x180
[  112.950822]  [<ffffffff81176b9a>] sys_read+0x4a/0xa0
[  112.950825]  [<ffffffff81486969>] system_call_fastpath+0x16/0x1b
[  112.950826] ---[ end trace b5c3af991219a333 ]---

# cat /sys/kernel/debug/dri/0/i915_error_state
cat: /sys/kernel/debug/dri/0/i915_error_state: Cannot allocate memory

[   115.163] [mi] EQ overflowing.  Additional events will be discarded
until existing events are processed.
[   115.163]
[   115.163] Backtrace:
[   115.174] 0: /usr/bin/Xorg (xorg_backtrace+0x36) [0x560366]
[   115.174] 1: /usr/bin/Xorg (mieqEnqueue+0x26b) [0x54161b]
[   115.174] 2: /usr/bin/Xorg (0x400000+0x47f92) [0x447f92]
[   115.174] 3: /usr/lib/xorg/modules/input/evdev_drv.so
(0x7fb83c986000+0x5f74) [0x7fb83c98bf74]
[   115.174] 4: /usr/bin/Xorg (0x400000+0x6efd7) [0x46efd7]
[   115.174] 5: /usr/bin/Xorg (0x400000+0x93370) [0x493370]
[   115.174] 6: /usr/lib/libpthread.so.0 (0x7fb840d71000+0xf170)
[0x7fb840d80170]
[   115.174] 7: /usr/lib/libc.so.6 (ioctl+0x7) [0x7fb83fcdb977]
[   115.174] 8: /usr/lib/libdrm.so.2 (drmIoctl+0x28) [0x7fb83e271978]
[   115.174] 9: /usr/lib/xorg/modules/drivers/intel_drv.so
(0x7fb83dd49000+0x3735f) [0x7fb83dd8035f]
[   115.174] 10: /usr/lib/xorg/modules/drivers/intel_drv.so
(0x7fb83dd49000+0x37c90) [0x7fb83dd80c90]
[   115.174] 11: /usr/lib/xorg/modules/drivers/intel_drv.so
(0x7fb83dd49000+0x39980) [0x7fb83dd82980]
[   115.174] 12: /usr/lib/xorg/modules/drivers/intel_drv.so
(0x7fb83dd49000+0x5c9e4) [0x7fb83dda59e4]
[   115.174] 13: /usr/lib/xorg/modules/drivers/intel_drv.so
(0x7fb83dd49000+0x6b925) [0x7fb83ddb4925]
[   115.174] 14: /usr/bin/Xorg (WakeupHandler+0x9a) [0x4384ba]
[   115.174] 15: /usr/bin/Xorg (WaitForSomething+0x1a4) [0x55d7a4]
[   115.174] 16: /usr/bin/Xorg (0x400000+0x34281) [0x434281]
[   115.174] 17: /usr/bin/Xorg (0x400000+0x23615) [0x423615]
[   115.174] 18: /usr/lib/libc.so.6 (__libc_start_main+0xf5) [0x7fb83fc1d725]
[   115.174] 19: /usr/bin/Xorg (0x400000+0x238ed) [0x4238ed]
[   115.174]
[   115.174] [mi] These backtraces from mieqEnqueue may point to a
culprit higher up the stack.
[   115.174] [mi] mieq is *NOT* the cause.  It is a victim.
[   116.576] [mi] Increasing EQ size to 512 to prevent dropped events.
[   116.577] [mi] EQ processing has resumed after 26 dropped events.
[   116.577] [mi] This may be caused my a misbehaving driver
monopolizing the server's resources.
[   134.817] (II) AIGLX: Suspending AIGLX clients for VT switch
[   142.167] (EE) intel(0): Detected a hung GPU, disabling acceleration.

Bisecting from 3.6-rc6 to 3.5, gives me:
bd22dc17e49973d3d4925970260e9e37f7580a9f is the first bad commit
But there are tons of patches inside it. How can i proceed?

If more information is needed, please let me know.

Thanks.



More information about the Intel-gfx mailing list