Out of memory killer is killing mysqld and webmin a lot on Centos 7

Hi,

We have a new Centos 7 x64 Virtualmin server running on an AWS m3.medium instance. Memory has not been a problem. It has 4GB and we never see it go into swap, usually hovering around 1.9GB used.

Today I noticed all sites were down and the mysqld service stopped due to oom-killer.

I would be interested to know why this would happen? Is this something you've seen before? This server used to run on Centos5 for years on 4GB and this never happened once. I am regretting going to Centos 7 a bit now. Should we consider building a new Centos 6.5 server instead?

As I type this webmin was just oom-killed...

Status: 
Active

Comments

Howdy -- we haven't heard of stability problems with CentOS 7.

If you're seeing processes being killed by oom-killer, it sounds like there may be some sort of resource issue that's occurring.

What is the output of this command:

free -m

Also, I'm curious if Apache is throwing any errors that would indicate heavy traffic... what is the output of this command:

tail -30 /var/log/httpd/error_log

Thanks for the help, as usual. Since I'm going to be sending logs, can you make this private?

$ free -m
             total       used       free     shared    buffers     cached
Mem:          3603       2582       1020        184          0        332
-/+ buffers/cache:       2250       1353
Swap:            0          0          0
sudo tail -50 /var/log/httpd/error_log
PHP Warning:  Module 'mcrypt' already loaded in Unknown on line 0
PHP Warning:  Module 'mcrypt' already loaded in Unknown on line 0
sh: /usr/local/bin/zip: No such file or directory
sh: zip: command not found
sh: /usr/bin/zip: No such file or directory
sh: /usr/local/sbin/zip: No such file or directory
sh: /usr/sbin/zip: No such file or directory
sh: /sbin/zip: No such file or directory
sh: /bin/zip: No such file or directory
sh: /usr/local/bin/unzip: No such file or directory
[Mon Feb 23 16:05:15.961424 2015] [fcgid:warn] [pid 1233] mod_fcgid: process 4900 graceful kill fail, sending SIGKILL
[Tue Feb 24 02:47:57.021135 2015] [mpm_prefork:notice] [pid 940] AH00171: Graceful restart requested, doing restart
[Tue Feb 24 02:47:57.991809 2015] [auth_digest:notice] [pid 940] AH01757: generating secret for digest authentication ...
[Tue Feb 24 02:47:57.995421 2015] [lbmethod_heartbeat:notice] [pid 940] AH02282: No slotmem from mod_heartmonitor
[Tue Feb 24 02:47:58.006998 2015] [ssl:warn] [pid 940] AH02292: Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)
[Tue Feb 24 02:47:58.094363 2015] [mpm_prefork:notice] [pid 940] AH00163: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_fcgid/2.3.9 PHP/5.4.16 SVN/1.7.14 configured -- resuming normal operations
[Tue Feb 24 02:47:58.094384 2015] [core:notice] [pid 940] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
PHP Warning:  Module 'mcrypt' already loaded in Unknown on line 0
PHP Warning:  Module 'mcrypt' already loaded in Unknown on line 0
[Tue Feb 24 03:12:03.761667 2015] [mpm_prefork:notice] [pid 940] AH00171: Graceful restart requested, doing restart
[Tue Feb 24 03:12:07.646939 2015] [auth_digest:notice] [pid 940] AH01757: generating secret for digest authentication ...
[Tue Feb 24 03:12:07.647602 2015] [lbmethod_heartbeat:notice] [pid 940] AH02282: No slotmem from mod_heartmonitor
[Tue Feb 24 03:12:07.650712 2015] [ssl:warn] [pid 940] AH02292: Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)
[Tue Feb 24 03:12:07.782511 2015] [mpm_prefork:notice] [pid 940] AH00163: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_fcgid/2.3.9 PHP/5.4.16 SVN/1.7.14 configured -- resuming normal operations
[Tue Feb 24 03:12:07.782538 2015] [core:notice] [pid 940] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Tue Feb 24 03:12:49.986019 2015] [mpm_prefork:notice] [pid 940] AH00171: Graceful restart requested, doing restart
[Tue Feb 24 03:12:50.490702 2015] [auth_digest:notice] [pid 940] AH01757: generating secret for digest authentication ...
[Tue Feb 24 03:12:50.491279 2015] [lbmethod_heartbeat:notice] [pid 940] AH02282: No slotmem from mod_heartmonitor
[Tue Feb 24 03:12:50.494308 2015] [ssl:warn] [pid 940] AH02292: Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)
[Tue Feb 24 03:12:50.607950 2015] [mpm_prefork:notice] [pid 940] AH00163: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_fcgid/2.3.9 PHP/5.4.16 SVN/1.7.14 configured -- resuming normal operations
[Tue Feb 24 03:12:50.607969 2015] [core:notice] [pid 940] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
PHP Warning:  Module 'mcrypt' already loaded in Unknown on line 0
PHP Warning:  Module 'mcrypt' already loaded in Unknown on line 0
PHP Warning:  Module 'mcrypt' already loaded in Unknown on line 0
PHP Warning:  Module 'mcrypt' already loaded in Unknown on line 0
PHP Warning:  Module 'mcrypt' already loaded in Unknown on line 0
PHP Warning:  Module 'mcrypt' already loaded in Unknown on line 0
sh: /usr/local/bin/zip: No such file or directory
sh: zip: command not found
sh: /usr/bin/zip: No such file or directory
sh: /usr/local/sbin/zip: No such file or directory
sh: /usr/sbin/zip: No such file or directory
sh: /sbin/zip: No such file or directory
sh: /bin/zip: No such file or directory
sh: /usr/local/bin/unzip: No such file or directory
[Tue Feb 24 16:04:31.672367 2015] [fcgid:warn] [pid 4218] mod_fcgid: process 21372 graceful kill fail, sending SIGKILL
[Wed Feb 25 04:42:23.054389 2015] [fcgid:error] [pid 4218] (12)Cannot allocate memory: mod_fcgid: can't run /home/3rcamera/fcgi-bin/php5.fcgi
[Wed Feb 25 04:42:23.266614 2015] [fcgid:warn] [pid 4218] (12)Cannot allocate memory: mod_fcgid: spawn process /home/3rcamera/fcgi-bin/php5.fcgi error
[Wed Feb 25 04:42:25.283379 2015] [fcgid:error] [pid 4218] (12)Cannot allocate memory: mod_fcgid: can't run /home/3rcamera/fcgi-bin/php5.fcgi
[Wed Feb 25 04:42:25.500473 2015] [fcgid:warn] [pid 4218] (12)Cannot allocate memory: mod_fcgid: spawn process /home/3rcamera/fcgi-bin/php5.fcgi error

By the way, looking thru /var/log/messages I see there have been quite a few process that have been oom-killed in there including mysqld. But that never was noticed because it seems like it somehow restarted itself? Is there some mechanism for mysqld to restart if it's oom-killed?

Ah, I just investigated and found that mysql_safe will restart mysqld if it goes down. Except that one time it didn't for some reason... So that explains why I never noticed it...

You may want to find a way to monitor the memory usage over time... it really does look like you're seeing a number of resource related problems. Apache shows some issues with FCGID not being able to spawn due to low memory.

I'm wondering if you're seeing occasional memory spikes of some kind. If that's true, you might want to monitor both the process list and the memory usage.

Also, I just wanted to verify -- which kernel are you using? You can determine that with the command "uname -a".

I'll be keeping my eye on it

uname -a
Linux ip-172-31-37-191.ec2.internal 3.10.0-123.20.1.el7.x86_64 #1 SMP Thu Jan 29 18:05:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Ah, I see SMTP (postfix) is using a very high amount of memory compared to what it usually uses. Its using 250MB and consuming a lot of the CPU. On the old server is averaged only a few MB, like 2-3 MB.

Any idea what could be causing so much resource to be devoted to SMTP? This server does not allow inbound email on port 25, only outbound.

...and 60 instances of SMTP running... hmmm

You may want to review the mail queue to see if there are any outgoing messages. You can see it in webmin - servers - postfix - mail queue.

Also, you could look at the mall logs as well for clues as to what postfix is doing.