『LINUX カーネル HACKS』のHACK#16に、OOM Killerについて説明があった。
各プロセスにポイントを付けて、ポイントの高い奴を殺す。
この時子プロセスがいれば、先に子プロセスを殺す。
仮想メモリを使っていて、子プロセスがたくさんあって、CPU使用時間や起動時間が短いものが選ばれやすい。
/proc/
を、「-1000」にしておくと殺されない。
stressコマンド
http://weather.ou.edu/~apw/projects/stress/
をまねして使ってみた。
# stress --vm 2 --vm-bytes 1G --vm-keep
stress: info: [18905] dispatching hogs: 0 cpu, 0 io, 2 vm, 0 hdd
stress: FAIL: [18905] (415)
stress: WARN: [18905] (417) now reaping child worker processes
stress: FAIL: [18905] (451) failed run completed in 32s
となって、/var/log/messages には、
May 7 03:13:03 localhost kernel: stress invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0 May 7 03:13:03 localhost kernel: stress cpuset=/ mems_allowed=0 May 7 03:13:03 localhost kernel: Pid: 18906, comm: stress Not tainted 2.6.32-358.6.1.el6.i686 #1 May 7 03:13:03 localhost kernel: Call Trace: May 7 03:13:03 localhost kernel: [<c04e7f34>] ? dump_header+0x84/0x190 May 7 03:13:03 localhost kernel: [<c04e82d8>] ? oom_kill_process+0x68/0x280 May 7 03:13:03 localhost kernel: [<c04e8212>] ? oom_badness+0x92/0xf0 May 7 03:13:03 localhost kernel: [<c04e8858>] ? out_of_memory+0xc8/0x1e0 May 7 03:13:03 localhost kernel: [<c04f51bd>] ? __alloc_pages_nodemask+0x7fd/0x810 May 7 03:13:03 localhost kernel: [<c050971f>] ? handle_pte_fault+0xa6f/0xdf0 May 7 03:13:03 localhost kernel: [<c0509bd1>] ? handle_mm_fault+0x131/0x1d0 May 7 03:13:03 localhost kernel: [<c04371fb>] ? __do_page_fault+0xfb/0x430 May 7 03:13:03 localhost kernel: [<c04be344>] ? __rcu_process_callbacks+0x44/0x2f0 May 7 03:13:03 localhost kernel: [<c04be625>] ? rcu_process_callbacks+0x35/0x40 May 7 03:13:03 localhost kernel: [<c045fb6e>] ? __do_softirq+0xae/0x1a0 May 7 03:13:03 localhost kernel: [<c084d3fa>] ? do_page_fault+0x2a/0x90 May 7 03:13:03 localhost kernel: [<c042bdc3>] ? smp_apic_timer_interrupt+0x53/0x90 May 7 03:13:03 localhost kernel: [<c084d3d0>] ? do_page_fault+0x0/0x90 May 7 03:13:03 localhost kernel: [<c084aea7>] ? error_code+0x73/0x78 May 7 03:13:03 localhost kernel: Mem-Info: May 7 03:13:03 localhost kernel: DMA per-cpu: May 7 03:13:03 localhost kernel: CPU 0: hi: 0, btch: 1 usd: 0 May 7 03:13:03 localhost kernel: Normal per-cpu: May 7 03:13:03 localhost kernel: CPU 0: hi: 186, btch: 31 usd: 60 May 7 03:13:03 localhost kernel: active_anon:56565 inactive_anon:56545 isolated_anon:0 May 7 03:13:03 localhost kernel: active_file:0 inactive_file:30 isolated_file:0 May 7 03:13:03 localhost kernel: unevictable:0 dirty:0 writeback:241 unstable:0 May 7 03:13:03 localhost kernel: free:1186 slab_reclaimable:1147 slab_unreclaimable:7615 May 7 03:13:03 localhost kernel: mapped:24 shmem:800 pagetables:987 bounce:0 May 7 03:13:03 localhost kernel: DMA free:2032kB min:88kB low:108kB high:132kB active_anon:2716kB inactive_anon:2916kB active_file:0kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15864kB mlocked:0kB dirty:0kB writeback:0kB mapped:60kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:36kB kernel_stack:0kB pagetables:32kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:46 all_unreclaimable? no May 7 03:13:03 localhost kernel: lowmem_reserve[]: 0 484 484 484 May 7 03:13:03 localhost kernel: Normal free:2712kB min:2768kB low:3460kB high:4152kB active_anon:223544kB inactive_anon:223264kB active_file:0kB inactive_file:64kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:495744kB mlocked:0kB dirty:0kB writeback:964kB mapped:36kB shmem:3200kB slab_reclaimable:4588kB slab_unreclaimable:30424kB kernel_stack:656kB pagetables:3916kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:17 all_unreclaimable? no May 7 03:13:03 localhost kernel: lowmem_reserve[]: 0 0 0 0 May 7 03:13:03 localhost kernel: DMA: 1*4kB 2*8kB 0*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2036kB May 7 03:13:03 localhost kernel: Normal: 36*4kB 5*8kB 10*16kB 2*32kB 0*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2712kB May 7 03:13:03 localhost kernel: 2060 total pagecache pages May 7 03:13:03 localhost kernel: 1223 pages in swap cache May 7 03:13:03 localhost kernel: Swap cache stats: add 470474, delete 469251, find 415/560 May 7 03:13:03 localhost kernel: Free swap = 0kB May 7 03:13:03 localhost kernel: Total swap = 1015800kB May 7 03:13:03 localhost kernel: 129007 pages RAM May 7 03:13:03 localhost kernel: 0 pages HighMem May 7 03:13:03 localhost kernel: 3330 pages reserved May 7 03:13:03 localhost kernel: 868 pages shared May 7 03:13:03 localhost kernel: 123334 pages non-shared May 7 03:13:03 localhost kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name May 7 03:13:03 localhost kernel: [ 418] 0 418 728 1 0 -17 -1000 udevd May 7 03:13:03 localhost kernel: [ 1074] 0 1074 709 1 0 0 0 dhclient May 7 03:13:03 localhost kernel: [ 1123] 0 1123 3233 1 0 -17 -1000 auditd May 7 03:13:03 localhost kernel: [ 1139] 0 1139 8993 1 0 0 0 rsyslogd May 7 03:13:03 localhost kernel: [ 1188] 0 1188 2144 1 0 -17 -1000 sshd May 7 03:13:03 localhost kernel: [ 1264] 0 1264 3132 22 0 0 0 master May 7 03:13:03 localhost kernel: [ 1273] 89 1273 3168 1 0 0 0 qmgr May 7 03:13:03 localhost kernel: [ 1274] 0 1274 1483 1 0 0 0 crond May 7 03:13:03 localhost kernel: [ 1287] 0 1287 502 1 0 0 0 mingetty May 7 03:13:03 localhost kernel: [ 1289] 0 1289 502 1 0 0 0 mingetty May 7 03:13:03 localhost kernel: [ 1291] 0 1291 502 1 0 0 0 mingetty May 7 03:13:03 localhost kernel: [ 1293] 0 1293 502 1 0 0 0 mingetty May 7 03:13:03 localhost kernel: [ 1297] 0 1297 502 1 0 0 0 mingetty May 7 03:13:03 localhost kernel: [ 1298] 0 1298 859 1 0 -17 -1000 udevd May 7 03:13:03 localhost kernel: [ 1299] 0 1299 859 1 0 -17 -1000 udevd May 7 03:13:03 localhost kernel: [ 1301] 0 1301 502 1 0 0 0 mingetty May 7 03:13:03 localhost kernel: [17420] 89 17420 3151 14 0 0 0 pickup May 7 03:13:04 localhost kernel: [17483] 0 17483 3103 34 0 0 0 sshd May 7 03:13:04 localhost kernel: [17487] 0 17487 1547 1 0 0 0 bash May 7 03:13:04 localhost kernel: [17519] 0 17519 714 1 0 0 0 anacron May 7 03:13:04 localhost kernel: [18888] 0 18888 1572 9 0 0 0 screen May 7 03:13:04 localhost kernel: [18889] 0 18889 1836 53 0 0 0 screen May 7 03:13:04 localhost kernel: [18890] 0 18890 1520 1 0 0 0 bash May 7 03:13:04 localhost kernel: [18897] 0 18897 1520 1 0 0 0 bash May 7 03:13:04 localhost kernel: [18904] 0 18904 675 63 0 0 0 top May 7 03:13:04 localhost kernel: [18905] 0 18905 513 2 0 0 0 stress May 7 03:13:04 localhost kernel: [18906] 0 18906 262658 56782 0 0 0 stress May 7 03:13:04 localhost kernel: [18907] 0 18907 262658 54142 0 0 0 stress May 7 03:13:04 localhost kernel: Out of memory: Kill process 18906 (stress) score 457 or sacrifice child May 7 03:13:04 localhost kernel: Killed process 18906, UID 0, (stress) total-vm:1050632kB, anon-rss:227120kB, file-rss:8kB
とあった。
May 7 03:13:03 localhost kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name May 7 03:13:03 localhost kernel: [ 418] 0 418 728 1 0 -17 -1000 udevd
の部分は、
/proc/sys/vm/oom_dump_tasks
が、「0」だと表示されない。
/proc/sys/vm/oom_kill_allocating_task
の値を書き換えると、元の値に戻しても
/var/log/messages
に記されない。
何でかな?
CentOS 6.4 2.6.32-358.6.1.el6.i686
で確認。