Server went down for a few hours. Why?

My server went down at lunchtime today and could only be restarted by a reboot. Looking at the logs, the following is repeated several times a second for a whle:

Nov 18 13:56:36 alpha kernel: Nov 18 13:56:36 alpha kernel: Mem-info: Nov 18 13:56:36 alpha kernel: DMA per-cpu: Nov 18 13:56:36 alpha kernel: cpu 0 hot: high 186, batch 31 used:169 Nov 18 13:56:36 alpha kernel: cpu 0 cold: high 62, batch 15 used:28 Nov 18 13:56:36 alpha kernel: cpu 1 hot: high 186, batch 31 used:33 Nov 18 13:56:36 alpha kernel: cpu 1 cold: high 62, batch 15 used:50 Nov 18 13:56:36 alpha kernel: DMA32 per-cpu: empty Nov 18 13:56:36 alpha kernel: Normal per-cpu: empty Nov 18 13:56:36 alpha kernel: HighMem per-cpu: empty Nov 18 13:56:36 alpha kernel: Free pages: 3820kB (0kB HighMem) Nov 18 13:56:36 alpha kernel: Active:117331 inactive:65552 dirty:0 writeback:0 unstable:0 free:955 slab:9267 mapped-file:1 mapped-anon:187110 pagetables:10769 Nov 18 13:56:36 alpha kernel: DMA free:3820kB min:3848kB low:4808kB high:5772kB active:469324kB inactive:262208kB present:925696kB pages_scanned:7572464 all_unreclaimable? yes Nov 18 13:56:36 alpha kernel: lowmem_reserve[]: 0 0 0 0 Nov 18 13:56:36 alpha kernel: DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Nov 18 13:56:36 alpha kernel: lowmem_reserve[]: 0 0 0 0 Nov 18 13:56:36 alpha kernel: Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Nov 18 13:56:36 alpha kernel: lowmem_reserve[]: 0 0 0 0 Nov 18 13:56:36 alpha kernel: HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Nov 18 13:56:36 alpha kernel: lowmem_reserve[]: 0 0 0 0 Nov 18 13:56:36 alpha kernel: DMA: 2834kB 1468kB 1316kB 132kB 064kB 0128kB 1256kB 0512kB 11024kB 02048kB 0*4096kB = 3820kB Nov 18 13:56:36 alpha kernel: DMA32: empty Nov 18 13:56:36 alpha kernel: Normal: empty Nov 18 13:56:36 alpha kernel: HighMem: empty Nov 18 13:56:36 alpha kernel: 36 pagecache pages Nov 18 13:56:36 alpha kernel: Swap cache: add 7004981, delete 7004974, find 18566623/19220363, race 5+564 Nov 18 13:56:36 alpha kernel: Free swap = 0kB Nov 18 13:56:36 alpha kernel: Total swap = 1835000kB Nov 18 13:56:36 alpha kernel: Free swap: 0kB Nov 18 13:56:36 alpha kernel: 231424 pages of RAM Nov 18 13:56:36 alpha kernel: 7321 reserved pages Nov 18 13:56:36 alpha kernel: 12885 pages shared Nov 18 13:56:36 alpha kernel: 7 pages swap cached Nov 18 13:56:36 alpha kernel: perl invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0 Nov 18 13:56:36 alpha kernel: Nov 18 13:56:36 alpha kernel: Call Trace: Nov 18 13:56:36 alpha kernel: [] out_of_memory+0x8b/0x203 Nov 18 13:56:36 alpha kernel: [] __alloc_pages+0x245/0x2ce Nov 18 13:56:36 alpha kernel: [] __lock_page+0x5e/0x64 Nov 18 13:56:36 alpha kernel: [] read_swap_cache_async+0x42/0xd1 Nov 18 13:56:36 alpha kernel: [] swapin_readahead+0x4e/0x77 Nov 18 13:56:36 alpha kernel: [] __handle_mm_fault+0xcfc/0x11f6 Nov 18 13:56:36 alpha kernel: [] _spin_lock_irqsave+0x9/0x14 Nov 18 13:56:36 alpha kernel: [] do_page_fault+0xf7b/0x12e0 Nov 18 13:56:36 alpha kernel: [] hrtimer_cancel+0xc/0x16 Nov 18 13:56:36 alpha kernel: [] do_nanosleep+0x47/0x70 Nov 18 13:56:36 alpha kernel: [] hrtimer_nanosleep+0x58/0x118 Nov 18 13:56:36 alpha kernel: [] error_exit+0x0/0x6e Nov 18 13:56:36 alpha kernel: Nov 18 13:56:36 alpha kernel: Mem-info: Nov 18 13:56:36 alpha kernel: DMA per-cpu: Nov 18 13:56:36 alpha kernel: cpu 0 hot: high 186, batch 31 used:166 Nov 18 13:56:36 alpha kernel: cpu 0 cold: high 62, batch 15 used:28 Nov 18 13:56:36 alpha kernel: cpu 1 hot: high 186, batch 31 used:33 Nov 18 13:56:36 alpha kernel: cpu 1 cold: high 62, batch 15 used:50 Nov 18 13:56:36 alpha kernel: DMA32 per-cpu: empty

I have no idea what this means (but there are a lot of memory references), but assume the issue is visible to someone with more experience - can anybody help?

Status: 
Active

Comments

Looks like something used up all the RAM, causing other processes to get killed.

If you'd run "top" when this happened and hit "M" , you would be able to see which processes are responsible .. although that would be hard if you couldn't login!

There isn't much that can be done to debug this after the fact.