[Bug 57136] [GM45 regression] GPU hang during disk io

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sat Dec 22 16:47:15 PST 2012


https://bugs.freedesktop.org/show_bug.cgi?id=57136

--- Comment #13 from Tom London <selinux at gmail.com> ---
Created attachment 72011
  --> https://bugs.freedesktop.org/attachment.cgi?id=72011&action=edit
tar.gz containing output of 'dmesg', Xorg.0.log, /var/gdm/:0*.log, empty
i915_error_state

I applied the patch suggested above and built a Fedora kernel,
kernel-3.7.0-6.local.fc19.x86_64.

I then booted that kernel and drove excessive I/O load on my Thinkpad X200: I
ran 'digikam', qemu-kvm of a Win7 image configured to use 2 cores, cat a 'cat
BIGFILES >/dev/null'.

While the system didn't crash until I got all the above running, it did
hang/crash.

This crash, I could not recover /system/kernel/debug/dri/0/i915_error_state: I
got a 'page allocation failure' when I attempted to copy it.

I've been BZ'ing this on the fedora bz for a while here:
https://bugzilla.redhat.com/show_bug.cgi?id=877461

That ticket has numerous more such failures/logs, included a few with non-zero
i915_error_state files.

I believe the patch was built in this kernel:

+ '[' '!' -f /home/tbl/rpmbuild/SOURCES/make-the-shrinker-less-aggressive.patch
']'
Patch33333: make-the-shrinker-less-aggressive.patch
+ case "$patch" in
+ patch -p1 -F1 -s
+ chmod +x scripts/checkpatch.pl


Here is what I see in dmesg:

[ 1103.968037] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU
hung
[ 1103.968330] [drm] capturing error event; look for more information in
/debug/dri/0/i915_error_state
[ 1105.845804] traps: gnome-shell[1259] trap int3 ip:39c9e4f597 sp:7fff189222d0
error:0
[ 1110.016026] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU
hung
[ 1110.070657] [drm:init_ring_common] *ERROR* render ring initialization failed
ctl 0001f001 head 00003000 tail 00000000 start 00003000
[ 1111.608050] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU
hung
[ 1111.609856] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[ 1111.609859] [drm:i915_reset] *ERROR* Failed to reset chip.
[ 1114.433338] gnome-shell (1259) used greatest stack depth: 1392 bytes left
[ 1250.731117] gnome-shell[2381]: segfault at 230 ip 00007fcabc3fd89f sp
00007fff0d9909d0 error 4 in i965_dri.so[7fcabc3ab000+b3000]
[ 1305.904089] gnome-shell[2446]: segfault at 230 ip 00007ffe0688089f sp
00007fff2883df50 error 4 in i965_dri.so[7ffe0682e000+b3000]
[ 1332.132006] [sched_delayed] sched: RT throttling activated
[ 1375.751786] gnome-shell[2500]: segfault at 230 ip 00007fa9663bb89f sp
00007fff5ad6a660 error 4 in i965_dri.so[7fa966369000+b3000]
[ 1715.826609] cat: page allocation failure: order:9, mode:0x40d0
[ 1715.828604] Pid: 2789, comm: cat Not tainted 3.7.0-6.local.fc19.x86_64 #1
[ 1715.830463] Call Trace:
[ 1715.832239]  [<ffffffff81167469>] warn_alloc_failed+0xe9/0x150
[ 1715.834110]  [<ffffffff8116a090>] ? page_alloc_cpu_notify+0x50/0x50
[ 1715.835995]  [<ffffffff810d8b6d>] ? trace_hardirqs_on+0xd/0x10
[ 1715.837676]  [<ffffffff8116bc25>] __alloc_pages_nodemask+0x8b5/0xb40
[ 1715.839345]  [<ffffffff811ad460>] alloc_pages_current+0xb0/0x120
[ 1715.840971]  [<ffffffff8116991e>] ? __free_pages_ok.part.54+0x9e/0xe0
[ 1715.842522]  [<ffffffff8116632a>] __get_free_pages+0x2a/0x80
[ 1715.844143]  [<ffffffff811b9c89>] kmalloc_order_trace+0x39/0x190
[ 1715.845784]  [<ffffffff811ba07d>] __kmalloc+0x29d/0x2d0
[ 1715.847337]  [<ffffffff811f8fcf>] seq_read+0x11f/0x3e0
[ 1715.848948]  [<ffffffff811d320c>] vfs_read+0xac/0x180
[ 1715.850413]  [<ffffffff811d3335>] sys_read+0x55/0xa0
[ 1715.851846]  [<ffffffff816fbd19>] system_call_fastpath+0x16/0x1b
[ 1715.853273] Mem-Info:
[ 1715.854697] Node 0 DMA per-cpu:
[ 1715.856193] CPU    0: hi:    0, btch:   1 usd:   0
[ 1715.857537] CPU    1: hi:    0, btch:   1 usd:   0
[ 1715.858823] Node 0 DMA32 per-cpu:
[ 1715.860145] CPU    0: hi:  186, btch:  31 usd:   0
[ 1715.861417] CPU    1: hi:  186, btch:  31 usd:   0
[ 1715.862668] Node 0 Normal per-cpu:
[ 1715.864038] CPU    0: hi:  186, btch:  31 usd:  32
[ 1715.865257] CPU    1: hi:  186, btch:  31 usd:   0
[ 1715.866418] active_anon:366511 inactive_anon:174404 isolated_anon:0
 active_file:60034 inactive_file:192590 isolated_file:0
 unevictable:30 dirty:25 writeback:0 unstable:0
 free:39181 slab_reclaimable:21162 slab_unreclaimable:95728
 mapped:29886 shmem:23042 pagetables:10701 bounce:0
 free_cma:0
[ 1715.872852] Node 0 DMA free:15848kB min:264kB low:328kB high:396kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:40kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15648kB
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 1715.876063] lowmem_reserve[]: 0 2947 3892 3892
[ 1715.877257] Node 0 DMA32 free:111548kB min:50976kB low:63720kB high:76464kB
active_anon:1288852kB inactive_anon:502196kB active_file:207752kB
inactive_file:687716kB unevictable:32kB isolated(anon):0kB isolated(file):0kB
present:3018404kB mlocked:32kB dirty:36kB writeback:0kB mapped:94252kB
shmem:50888kB slab_reclaimable:39640kB slab_unreclaimable:128856kB
kernel_stack:840kB pagetables:20200kB unstable:0kB bounce:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 1715.882062] lowmem_reserve[]: 0 0 945 945
[ 1715.883246] Node 0 Normal free:27972kB min:16340kB low:20424kB high:24508kB
active_anon:177192kB inactive_anon:195420kB active_file:32384kB
inactive_file:83896kB unevictable:88kB isolated(anon):0kB isolated(file):0kB
present:967680kB mlocked:88kB dirty:64kB writeback:0kB mapped:25292kB
shmem:41280kB slab_reclaimable:45008kB slab_unreclaimable:254040kB
kernel_stack:1960kB pagetables:22604kB unstable:0kB bounce:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 1715.888714] lowmem_reserve[]: 0 0 0 0
[ 1715.890149] Node 0 DMA: 2*4kB 2*8kB 1*16kB 0*32kB 3*64kB 2*128kB 2*256kB
1*512kB 2*1024kB 2*2048kB 2*4096kB = 15848kB
[ 1715.891696] Node 0 DMA32: 1573*4kB 1401*8kB 906*16kB 510*32kB 424*64kB
202*128kB 26*256kB 3*512kB 0*1024kB 1*2048kB 0*4096kB = 111548kB
[ 1715.893244] Node 0 Normal: 1393*4kB 669*8kB 203*16kB 144*32kB 58*64kB
25*128kB 4*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 27228kB
[ 1715.894843] 282546 total pagecache pages
[ 1715.896341] 6307 pages in swap cache
[ 1715.897947] Swap cache stats: add 54887, delete 48580, find 8649/10442
[ 1715.899457] Free swap  = 6015380kB
[ 1715.900944] Total swap = 6127612kB
[ 1715.919139] 1032176 pages RAM
[ 1715.920639] 52602 pages reserved
[ 1715.922114] 714600 pages shared
[ 1715.923569] 896850 pages non-shared

Let me know if I can provide more or test more....

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20121223/0453d685/attachment-0001.html>


More information about the intel-gfx-bugs mailing list