OOM Killer

estis2013/05/07 (火) 04:25 に投稿

『LINUX カーネル HACKS』のHACK#16に、OOM Killerについて説明があった。

各プロセスにポイントを付けて、ポイントの高い奴を殺す。
この時子プロセスがいれば、先に子プロセスを殺す。

仮想メモリを使っていて、子プロセスがたくさんあって、CPU使用時間や起動時間が短いものが選ばれやすい。

/proc//oom_score_adj
を、「-1000」にしておくと殺されない。

stressコマンド
http://weather.ou.edu/~apw/projects/stress/
をまねして使ってみた。

# stress --vm 2 --vm-bytes 1G --vm-keep
stress: info: [18905] dispatching hogs: 0 cpu, 0 io, 2 vm, 0 hdd
stress: FAIL: [18905] (415) stress: WARN: [18905] (417) now reaping child worker processes
stress: FAIL: [18905] (451) failed run completed in 32s

となって、/var/log/messages には、

May  7 03:13:03 localhost kernel: stress invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
May  7 03:13:03 localhost kernel: stress cpuset=/ mems_allowed=0
May  7 03:13:03 localhost kernel: Pid: 18906, comm: stress Not tainted 2.6.32-358.6.1.el6.i686 #1
May  7 03:13:03 localhost kernel: Call Trace:
May  7 03:13:03 localhost kernel: [&ltc04e7f34>] ? dump_header+0x84/0x190
May  7 03:13:03 localhost kernel: [&ltc04e82d8>] ? oom_kill_process+0x68/0x280
May  7 03:13:03 localhost kernel: [&ltc04e8212>] ? oom_badness+0x92/0xf0
May  7 03:13:03 localhost kernel: [&ltc04e8858>] ? out_of_memory+0xc8/0x1e0
May  7 03:13:03 localhost kernel: [&ltc04f51bd>] ? __alloc_pages_nodemask+0x7fd/0x810
May  7 03:13:03 localhost kernel: [&ltc050971f>] ? handle_pte_fault+0xa6f/0xdf0
May  7 03:13:03 localhost kernel: [&ltc0509bd1>] ? handle_mm_fault+0x131/0x1d0
May  7 03:13:03 localhost kernel: [&ltc04371fb>] ? __do_page_fault+0xfb/0x430
May  7 03:13:03 localhost kernel: [&ltc04be344>] ? __rcu_process_callbacks+0x44/0x2f0
May  7 03:13:03 localhost kernel: [&ltc04be625>] ? rcu_process_callbacks+0x35/0x40
May  7 03:13:03 localhost kernel: [&ltc045fb6e>] ? __do_softirq+0xae/0x1a0
May  7 03:13:03 localhost kernel: [&ltc084d3fa>] ? do_page_fault+0x2a/0x90
May  7 03:13:03 localhost kernel: [&ltc042bdc3>] ? smp_apic_timer_interrupt+0x53/0x90
May  7 03:13:03 localhost kernel: [&ltc084d3d0>] ? do_page_fault+0x0/0x90
May  7 03:13:03 localhost kernel: [&ltc084aea7>] ? error_code+0x73/0x78
May  7 03:13:03 localhost kernel: Mem-Info:
May  7 03:13:03 localhost kernel: DMA per-cpu:
May  7 03:13:03 localhost kernel: CPU    0: hi:    0, btch:   1 usd:   0
May  7 03:13:03 localhost kernel: Normal per-cpu:
May  7 03:13:03 localhost kernel: CPU    0: hi:  186, btch:  31 usd:  60
May  7 03:13:03 localhost kernel: active_anon:56565 inactive_anon:56545 isolated_anon:0
May  7 03:13:03 localhost kernel: active_file:0 inactive_file:30 isolated_file:0
May  7 03:13:03 localhost kernel: unevictable:0 dirty:0 writeback:241 unstable:0
May  7 03:13:03 localhost kernel: free:1186 slab_reclaimable:1147 slab_unreclaimable:7615
May  7 03:13:03 localhost kernel: mapped:24 shmem:800 pagetables:987 bounce:0
May  7 03:13:03 localhost kernel: DMA free:2032kB min:88kB low:108kB high:132kB active_anon:2716kB inactive_anon:2916kB active_file:0kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15864kB mlocked:0kB dirty:0kB writeback:0kB mapped:60kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:36kB kernel_stack:0kB pagetables:32kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:46 all_unreclaimable? no
May  7 03:13:03 localhost kernel: lowmem_reserve[]: 0 484 484 484
May  7 03:13:03 localhost kernel: Normal free:2712kB min:2768kB low:3460kB high:4152kB active_anon:223544kB inactive_anon:223264kB active_file:0kB inactive_file:64kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:495744kB mlocked:0kB dirty:0kB writeback:964kB mapped:36kB shmem:3200kB slab_reclaimable:4588kB slab_unreclaimable:30424kB kernel_stack:656kB pagetables:3916kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:17 all_unreclaimable? no
May  7 03:13:03 localhost kernel: lowmem_reserve[]: 0 0 0 0
May  7 03:13:03 localhost kernel: DMA: 1*4kB 2*8kB 0*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2036kB
May  7 03:13:03 localhost kernel: Normal: 36*4kB 5*8kB 10*16kB 2*32kB 0*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2712kB
May  7 03:13:03 localhost kernel: 2060 total pagecache pages
May  7 03:13:03 localhost kernel: 1223 pages in swap cache
May  7 03:13:03 localhost kernel: Swap cache stats: add 470474, delete 469251, find 415/560
May  7 03:13:03 localhost kernel: Free swap  = 0kB
May  7 03:13:03 localhost kernel: Total swap = 1015800kB
May  7 03:13:03 localhost kernel: 129007 pages RAM
May  7 03:13:03 localhost kernel: 0 pages HighMem
May  7 03:13:03 localhost kernel: 3330 pages reserved
May  7 03:13:03 localhost kernel: 868 pages shared
May  7 03:13:03 localhost kernel: 123334 pages non-shared
May  7 03:13:03 localhost kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
May  7 03:13:03 localhost kernel: [  418]     0   418      728        1   0     -17         -1000 udevd
May  7 03:13:03 localhost kernel: [ 1074]     0  1074      709        1   0       0             0 dhclient
May  7 03:13:03 localhost kernel: [ 1123]     0  1123     3233        1   0     -17         -1000 auditd
May  7 03:13:03 localhost kernel: [ 1139]     0  1139     8993        1   0       0             0 rsyslogd
May  7 03:13:03 localhost kernel: [ 1188]     0  1188     2144        1   0     -17         -1000 sshd
May  7 03:13:03 localhost kernel: [ 1264]     0  1264     3132       22   0       0             0 master
May  7 03:13:03 localhost kernel: [ 1273]    89  1273     3168        1   0       0             0 qmgr
May  7 03:13:03 localhost kernel: [ 1274]     0  1274     1483        1   0       0             0 crond
May  7 03:13:03 localhost kernel: [ 1287]     0  1287      502        1   0       0             0 mingetty
May  7 03:13:03 localhost kernel: [ 1289]     0  1289      502        1   0       0             0 mingetty
May  7 03:13:03 localhost kernel: [ 1291]     0  1291      502        1   0       0             0 mingetty
May  7 03:13:03 localhost kernel: [ 1293]     0  1293      502        1   0       0             0 mingetty
May  7 03:13:03 localhost kernel: [ 1297]     0  1297      502        1   0       0             0 mingetty
May  7 03:13:03 localhost kernel: [ 1298]     0  1298      859        1   0     -17         -1000 udevd
May  7 03:13:03 localhost kernel: [ 1299]     0  1299      859        1   0     -17         -1000 udevd
May  7 03:13:03 localhost kernel: [ 1301]     0  1301      502        1   0       0             0 mingetty
May  7 03:13:03 localhost kernel: [17420]    89 17420     3151       14   0       0             0 pickup
May  7 03:13:04 localhost kernel: [17483]     0 17483     3103       34   0       0             0 sshd
May  7 03:13:04 localhost kernel: [17487]     0 17487     1547        1   0       0             0 bash
May  7 03:13:04 localhost kernel: [17519]     0 17519      714        1   0       0             0 anacron
May  7 03:13:04 localhost kernel: [18888]     0 18888     1572        9   0       0             0 screen
May  7 03:13:04 localhost kernel: [18889]     0 18889     1836       53   0       0             0 screen
May  7 03:13:04 localhost kernel: [18890]     0 18890     1520        1   0       0             0 bash
May  7 03:13:04 localhost kernel: [18897]     0 18897     1520        1   0       0             0 bash
May  7 03:13:04 localhost kernel: [18904]     0 18904      675       63   0       0             0 top
May  7 03:13:04 localhost kernel: [18905]     0 18905      513        2   0       0             0 stress
May  7 03:13:04 localhost kernel: [18906]     0 18906   262658    56782   0       0             0 stress
May  7 03:13:04 localhost kernel: [18907]     0 18907   262658    54142   0       0             0 stress
May  7 03:13:04 localhost kernel: Out of memory: Kill process 18906 (stress) score 457 or sacrifice child
May  7 03:13:04 localhost kernel: Killed process 18906, UID 0, (stress) total-vm:1050632kB, anon-rss:227120kB, file-rss:8kB

とあった。

May  7 03:13:03 localhost kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
May  7 03:13:03 localhost kernel: [  418]     0   418      728        1   0     -17         -1000 udevd

の部分は、
/proc/sys/vm/oom_dump_tasks
が、「0」だと表示されない。

/proc/sys/vm/oom_kill_allocating_task
の値を書き換えると、元の値に戻しても
/var/log/messages
に記されない。
何でかな?

CentOS 6.4 2.6.32-358.6.1.el6.i686
で確認。

【送料無料】LinuxカーネルHacks [ 池田宗広 ]

【送料無料】LinuxカーネルHacks [ 池田宗広 ]
価格:4,200円(税込、送料込)