[Intel-gfx] [PATCH] mm: Throttle shrinkers harder

Chris Wilson chris at chris-wilson.co.uk
Thu Apr 24 07:58:36 CEST 2014


On Wed, Apr 23, 2014 at 02:14:36PM -0700, Dave Hansen wrote:
> On 04/22/2014 12:30 PM, Daniel Vetter wrote:
> >> > > During testing of i915.ko with working texture sets larger than RAM, we
> >> > > encounter OOM with plenty of memory still trapped within writeback, e.g:
> >> > > 
> >> > > [   42.386039] active_anon:10134 inactive_anon:1900781 isolated_anon:32
> >> > >  active_file:33 inactive_file:39 isolated_file:0
> >> > >  unevictable:0 dirty:0 writeback:337627 unstable:0
> >> > >  free:11985 slab_reclaimable:9458 slab_unreclaimable:23614
> >> > >  mapped:41 shmem:1560769 pagetables:1276 bounce:0
> >> > > 
> >> > > If we throttle for writeback following shrink_slab, this gives us time
> >> > > to wait upon the writeback generated by the i915.ko shinker:
> >> > > 
> >> > > [ 4756.750808] active_anon:24386 inactive_anon:900793 isolated_anon:0
> >> > >  active_file:23 inactive_file:20 isolated_file:0
> >> > >  unevictable:0 dirty:0 writeback:0 unstable:0
> >> > >  free:5550 slab_reclaimable:5184 slab_unreclaimable:4888
> >> > >  mapped:3 shmem:472393 pagetables:1249 bounce:0
> 
> Could you get some dumps of the entire set of OOM information?  These
> are only tiny snippets.

For reference the last oom report after flushing all the writeback:

[ 4756.749554] crond invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[ 4756.749603] crond cpuset=/ mems_allowed=0
[ 4756.749628] CPU: 0 PID: 3574 Comm: crond Tainted: G        W    3.14.0_prts_de579f_20140410 #2
[ 4756.749676] Hardware name: Gigabyte Technology Co., Ltd. H55M-UD2H/H55M-UD2H, BIOS F4 12/02/2009
[ 4756.749723]  0000000000000000 00000000000201da ffffffff81717273 ffff8800d235dc40
[ 4756.749762]  ffffffff81714541 0000000000000400 ffff8800cb6f3b10 ffff880117ff8000
[ 4756.749800]  ffffffff81072266 0000000000000206 ffffffff812d6ebe ffff880112f25c40
[ 4756.749838] Call Trace:
[ 4756.749856]  [<ffffffff81717273>] ? dump_stack+0x41/0x51
[ 4756.749881]  [<ffffffff81714541>] ? dump_header.isra.8+0x69/0x191
[ 4756.749911]  [<ffffffff81072266>] ? ktime_get_ts+0x49/0xab
[ 4756.749938]  [<ffffffff812d6ebe>] ? ___ratelimit+0xae/0xc8
[ 4756.749965]  [<ffffffff810a72a8>] ? oom_kill_process+0x76/0x32c
[ 4756.749992]  [<ffffffff810a706d>] ? find_lock_task_mm+0x22/0x6e
[ 4756.750018]  [<ffffffff810a7add>] ? out_of_memory+0x41c/0x44f
[ 4756.750045]  [<ffffffff810ab31d>] ? __alloc_pages_nodemask+0x680/0x78d
[ 4756.750076]  [<ffffffff810d4b7f>] ? alloc_pages_current+0xbf/0xdc
[ 4756.750103]  [<ffffffff810a61f8>] ? filemap_fault+0x266/0x38b
[ 4756.750130]  [<ffffffff810bc3f5>] ? __do_fault+0xac/0x3bf
[ 4756.750155]  [<ffffffff810bfb85>] ? handle_mm_fault+0x1e7/0x7e2
[ 4756.750181]  [<ffffffff810bc960>] ? tlb_flush_mmu+0x4b/0x64
[ 4756.750219]  [<ffffffff812d8ed5>] ? timerqueue_add+0x79/0x98
[ 4756.750254]  [<ffffffff8104d283>] ? enqueue_hrtimer+0x15/0x37
[ 4756.750287]  [<ffffffff8171f63d>] ? __do_page_fault+0x42e/0x47b
[ 4756.750319]  [<ffffffff8104d580>] ? hrtimer_try_to_cancel+0x67/0x70
[ 4756.750353]  [<ffffffff8104d595>] ? hrtimer_cancel+0xc/0x16
[ 4756.750385]  [<ffffffff81719807>] ? do_nanosleep+0xb3/0xf1
[ 4756.750415]  [<ffffffff8104def9>] ? hrtimer_nanosleep+0x89/0x10b
[ 4756.750447]  [<ffffffff8171cbf2>] ? page_fault+0x22/0x30
[ 4756.750476] Mem-Info:
[ 4756.750490] Node 0 DMA per-cpu:
[ 4756.750510] CPU    0: hi:    0, btch:   1 usd:   0
[ 4756.750533] CPU    1: hi:    0, btch:   1 usd:   0
[ 4756.750555] CPU    2: hi:    0, btch:   1 usd:   0
[ 4756.750576] CPU    3: hi:    0, btch:   1 usd:   0
[ 4756.750598] Node 0 DMA32 per-cpu:
[ 4756.750615] CPU    0: hi:  186, btch:  31 usd:   0
[ 4756.750637] CPU    1: hi:  186, btch:  31 usd:   0
[ 4756.750660] CPU    2: hi:  186, btch:  31 usd:   0
[ 4756.750681] CPU    3: hi:  186, btch:  31 usd:   0
[ 4756.750702] Node 0 Normal per-cpu:
[ 4756.750720] CPU    0: hi:   90, btch:  15 usd:   0
[ 4756.750742] CPU    1: hi:   90, btch:  15 usd:   0
[ 4756.750763] CPU    2: hi:   90, btch:  15 usd:   0
[ 4756.750785] CPU    3: hi:   90, btch:  15 usd:   0
[ 4756.750808] active_anon:24386 inactive_anon:900793 isolated_anon:0
 active_file:23 inactive_file:20 isolated_file:0
 unevictable:0 dirty:0 writeback:0 unstable:0
 free:5550 slab_reclaimable:5184 slab_unreclaimable:4888
 mapped:3 shmem:472393 pagetables:1249 bounce:0
 free_cma:0
[ 4756.750938] Node 0 DMA free:14664kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:1024kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:412kB slab_reclaimable:80kB slab_unreclaimable:24kB kernel_stack:0kB pagetables:48kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:76 all_unreclaimable? yes
[ 4756.751103] lowmem_reserve[]: 0 3337 3660 3660
[ 4756.751133] Node 0 DMA32 free:7208kB min:7044kB low:8804kB high:10564kB active_anon:36172kB inactive_anon:3351408kB active_file:92kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3518336kB managed:3440548kB mlocked:0kB dirty:0kB writeback:0kB mapped:12kB shmem:1661420kB slab_reclaimable:17624kB slab_unreclaimable:14400kB kernel_stack:696kB pagetables:4324kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:327 all_unreclaimable? yes
[ 4756.751341] lowmem_reserve[]: 0 0 322 322
[ 4756.752889] Node 0 Normal free:328kB min:680kB low:848kB high:1020kB active_anon:61372kB inactive_anon:250740kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:393216kB managed:330360kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:227740kB slab_reclaimable:3032kB slab_unreclaimable:5128kB kernel_stack:400kB pagetables:624kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:6 all_unreclaimable? yes
[ 4756.757635] lowmem_reserve[]: 0 0 0 0
[ 4756.759294] Node 0 DMA: 2*4kB (UM) 2*8kB (UM) 3*16kB (UEM) 4*32kB (UEM) 2*64kB (UM) 4*128kB (UEM) 2*256kB (EM) 2*512kB (EM) 2*1024kB (UM) 3*2048kB (EMR) 1*4096kB (M) = 14664kB
[ 4756.762776] Node 0 DMA32: 424*4kB (UEM) 171*8kB (UEM) 21*16kB (UEM) 1*32kB (R) 1*64kB (R) 1*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 1*2048kB (R) 0*4096kB = 7208kB
[ 4756.766284] Node 0 Normal: 26*4kB (UER) 18*8kB (UER) 3*16kB (E) 1*32kB (R) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 328kB
[ 4756.768198] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 4756.770026] 916139 total pagecache pages
[ 4756.771857] 443703 pages in swap cache
[ 4756.773695] Swap cache stats: add 15363874, delete 14920171, find 6533699/7512215
[ 4756.775592] Free swap  = 0kB
[ 4756.777505] Total swap = 2047996kB
[ 4756.779410] 981886 pages RAM
[ 4756.781307] 0 pages HighMem/MovableOnly
[ 4756.783192] 15714 pages reserved
[ 4756.785038] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[ 4756.786929] [ 2368]     0  2368    76187       93     153        0             0 systemd-journal
[ 4756.788846] [ 3204]     0  3204    10305      189      23        7         -1000 systemd-udevd
[ 4756.790789] [ 3223]     0  3223    24466       24      22       21             0 lvmetad
[ 4756.792749] [ 3297]     0  3297    12231       68      24       25         -1000 auditd
[ 4756.794715] [ 3306]     0  3306    20053       33       9        0             0 audispd
[ 4756.796680] [ 3308]     0  3308     5993       40      27        1             0 sedispatch
[ 4756.798654] [ 3315]     0  3315     9254       60      36        5             0 abrtd
[ 4756.800646] [ 3316]     0  3316     8725       60      35        1             0 abrt-watch-log
[ 4756.802627] [ 3324]     0  3324     8725       42      35       17             0 abrt-watch-log
[ 4756.804614] [ 3331]     0  3331     4778       43      21        9             0 irqbalance
[ 4756.806604] [ 3337]     0  3337     6069      131      16       20             0 smartd
[ 4756.808597] [ 3343]     0  3343     8249       80      20        0             0 systemd-logind
[ 4756.810593] [ 3344]     0  3344    65772      129      29       14             0 rsyslogd
[ 4756.812594] [ 3346]     0  3346    60608      305      50        9             0 NetworkManager
[ 4756.814602] [ 3347]    70  3347     7018       76      31        1             0 avahi-daemon
[ 4756.816619] [ 3352]    70  3352     6985       48      22        2             0 avahi-daemon
[ 4756.818629] [ 3353]    81  3353     6119      121      17        3          -900 dbus-daemon
[ 4756.820651] [ 3362]   993  3362     5647       55      15        4             0 chronyd
[ 4756.822694] [ 3363]     0  3363     1619       12      10       16             0 mcelog
[ 4756.824760] [ 3389]   999  3389   127896      746      47       65             0 polkitd
[ 4756.826827] [ 3397]     0  3397    40407      161      67       26          -900 modem-manager
[ 4756.828939] [ 3424]     0  3424    25498     2827      49      287             0 dhclient
[ 4756.831039] [ 3432]     0  3432   106838     1061     138      128             0 libvirtd
[ 4756.833154] [ 3446]     0  3446    20104      190      43       10         -1000 sshd
[ 4756.835275] [ 3453]    32  3453     9422       66      22       26             0 rpcbind
[ 4756.837308] [ 3461]     0  3461    25190      399      48       49             0 sendmail
[ 4756.839335] [ 3478]    51  3478    21452      361      41       16             0 sendmail
[ 4756.841386] [ 3573]     0  3573     5930       47      16        0             0 atd
[ 4756.843458] [ 3574]     0  3574     5126      147      14        2             0 crond
[ 4756.845578] [ 3579]     0  3579    27498       27      10        1             0 agetty
[ 4756.847706] [ 3582]     0  3582    32256      220      65       63             0 sshd
[ 4756.849844] [ 3586]     0  3586    29263      456      20       51             0 bash
[ 4756.851997] [ 3765]     0  3765    15968       93      47        0          1000 gem_tiled_swapp
[ 4756.854178] Out of memory: Kill process 3765 (gem_tiled_swapp) score 999 or sacrifice child
[ 4756.856377] Killed process 3765 (gem_tiled_swapp) total-vm:63872kB, anon-rss:368kB, file-rss:4kB

> Also, the vmstat output from the bug:
> 
> > https://bugs.freedesktop.org/show_bug.cgi?id=72742
> 
> shows there being an *AWFUL* lot of swap I/O going on here.  From the
> looks of it, we stuck ~2GB in swap and evicted another 1.5GB of page
> cache (although I guess that could be double-counting tmpfs getting
> swapped out too).  Hmmm, was this one of the cases where you actually
> ran _out_ of swap?

Yes. This bug is a little odd because they always run out of swap. We
have another category of bug (which appears to be fixed, touch wood)
where we trigger oom without even touching swap. The test case is
designed to only just swap (use at most 1/4 of the available swap space)
and checks that its working set should fit into available memory + swap.
However, when QA run the test, their systems run completely out of
virtual memory. There is a discrepancy on their machines where
anon_inactive is reported as being 2x shmem, but we only expect
anon_inactive to be our own shmem allocations. I don't know how to track
what else is using anon_inactive. Suggestions?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list