System Slowdown Catastrohpic!

13 posts / 0 new
Last post
#1 Tue, 03/17/2015 - 13:08
Shinzan

System Slowdown Catastrohpic!

Ubuntu 14.01 LTS 16GB RAM Power Edge Dell Dual Xeon 1.9 (8 cores total)

about 1/week I experience a horrendous slowdown on websites.

mail seems to run fine, I've installed a ton of diagnostic tools to see the problem but i can't fix it, htop shows all cores pegged, websites take like 10 seconds to first byte or error 500 out all together on complex .php sites.

the box has 17 sites.

Attatched are some htop results

Any thoughts or direction on how I can fix this? I'd like to put another important site on here but im scared because of these slowdowns.

Tue, 03/17/2015 - 13:26
andreychek

Howdy,

Your htop output there does show a load of 10.00, which is a bit on the high end. It's tough to determine the culprit from that though... next time that occurs, could you run the following commands, and share the output they produce:

ps auxwf
mailq | tail -1
netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -10

Also, just to reduce the load on your server -- you may want to go into System Settings -> Virtualmin Config -> Status Collection, and there, set "Interval between status collection job runs" to something a bit higher than the default... perhaps 60 minutes would be a good place to start.

-Eric

Tue, 03/17/2015 - 13:59
Welshman
Welshman's picture

Who your server with? I have 16gb servers with up to a hundred sites and they dont break a sweat.

Server needs tweaking I bet or your on a lousey network.

Chaos Reigns Within, Reflect, Repent and Reboot, Order Shall Return.

Tue, 03/17/2015 - 20:36
Shinzan

The server is on a dedicated 20/20 Fiberoptic connection with Suddenlink Communications,

This server is WAY overpowered for what we are doing i migrated to virtualmin from an IMSCP box (those guys in their forums are real a-holes, I started with VHCS, then forked to ISPCP, then finally IMSCP, and the server I came from had the same websites but was a 10th as powerful and did fine, and this server has plenty of power something has to be misconfigured)

So I migrated both Linux control panels and physical servers (I'm also running on VSPhere)

ps auxwf
mailq | tail -1
netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -10

Output:

ps auxwf

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         2  0.0  0.0      0     0 ?        S    Mar12   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    Mar12   1:31  \_ [ksoftirqd/0]
root         4  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [kworker/0:0]
root         5  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [kworker/0:0H]
root         7  0.6  0.0      0     0 ?        S    Mar12  46:30  \_ [rcu_sched]
root         8  0.2  0.0      0     0 ?        S    Mar12  16:03  \_ [rcuos/0]
root         9  0.2  0.0      0     0 ?        S    Mar12  15:34  \_ [rcuos/1]
root        10  0.2  0.0      0     0 ?        S    Mar12  15:43  \_ [rcuos/2]
root        11  0.1  0.0      0     0 ?        S    Mar12  13:07  \_ [rcuos/3]
root        12  0.1  0.0      0     0 ?        S    Mar12  12:47  \_ [rcuos/4]
root        13  0.2  0.0      0     0 ?        S    Mar12  15:26  \_ [rcuos/5]
root        14  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [rcu_bh]
root        15  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [rcuob/0]
root        16  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [rcuob/1]
root        17  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [rcuob/2]
root        18  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [rcuob/3]
root        19  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [rcuob/4]
root        20  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [rcuob/5]
root        21  0.0  0.0      0     0 ?        S    Mar12   1:28  \_ [migration/0]
root        22  0.0  0.0      0     0 ?        S    Mar12   0:11  \_ [watchdog/0]
root        23  0.0  0.0      0     0 ?        S    Mar12   0:10  \_ [watchdog/1]
root        24  0.0  0.0      0     0 ?        S    Mar12   1:12  \_ [migration/1]
root        25  0.0  0.0      0     0 ?        S    Mar12   1:20  \_ [ksoftirqd/1]
root        27  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [kworker/1:0H]
root        28  0.0  0.0      0     0 ?        S    Mar12   0:08  \_ [watchdog/2]
root        29  0.0  0.0      0     0 ?        S    Mar12   1:07  \_ [migration/2]
root        30  0.0  0.0      0     0 ?        S    Mar12   1:19  \_ [ksoftirqd/2]
root        31  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [kworker/2:0]
root        32  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [kworker/2:0H]
root        33  0.0  0.0      0     0 ?        S    Mar12   0:09  \_ [watchdog/3]
root        34  0.0  0.0      0     0 ?        S    Mar12   1:00  \_ [migration/3]
root        35  0.0  0.0      0     0 ?        S    Mar12   1:01  \_ [ksoftirqd/3]
root        36  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [kworker/3:0]
root        37  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [kworker/3:0H]
root        38  0.0  0.0      0     0 ?        S    Mar12   0:10  \_ [watchdog/4]
root        39  0.0  0.0      0     0 ?        S    Mar12   0:57  \_ [migration/4]
root        40  0.0  0.0      0     0 ?        S    Mar12   0:51  \_ [ksoftirqd/4]
root        41  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [kworker/4:0]
root        42  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [kworker/4:0H]
root        43  0.0  0.0      0     0 ?        S    Mar12   0:11  \_ [watchdog/5]
root        44  0.0  0.0      0     0 ?        S    Mar12   1:01  \_ [migration/5]
root        45  0.0  0.0      0     0 ?        S    Mar12   1:20  \_ [ksoftirqd/5]
root        46  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [kworker/5:0]
root        47  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [kworker/5:0H]
root        48  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [khelper]
root        49  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [kdevtmpfs]
root        50  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [netns]
root        51  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [writeback]
root        52  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [kintegrityd]
root        53  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [bioset]
root        54  0.0  0.0      0     0 ?        S<   Mar12   0:12  \_ [kworker/u13:0]
root        55  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [kblockd]
root        56  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [ata_sff]
root        57  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [khubd]
root        58  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [md]
root        59  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [devfreq_wq]
root        60  0.0  0.0      0     0 ?        S    Mar12   1:04  \_ [kworker/3:1]
root        61  0.0  0.0      0     0 ?        S    Mar12   0:38  \_ [kworker/0:1]
root        62  0.0  0.0      0     0 ?        S    Mar12   0:35  \_ [kworker/1:1]
root        63  0.0  0.0      0     0 ?        S    Mar12   0:35  \_ [kworker/4:1]
root        64  0.0  0.0      0     0 ?        S    Mar12   0:34  \_ [kworker/5:1]
root        65  0.0  0.0      0     0 ?        S    Mar12   0:39  \_ [kworker/2:1]
root        67  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [khungtaskd]
root        68  0.0  0.0      0     0 ?        S    Mar12   0:21  \_ [kswapd0]
root        69  0.0  0.0      0     0 ?        SN   Mar12   0:00  \_ [ksmd]
root        70  0.0  0.0      0     0 ?        SN   Mar12   0:41  \_ [khugepaged]
root        71  0.0  0.0      0     0 ?        S    Mar12   0:01  \_ [fsnotify_mark]
root        72  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [ecryptfs-kthrea]
root        73  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [crypto]
root        85  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [kthrotld]
root        87  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [scsi_eh_0]
root        88  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [scsi_eh_1]
root       109  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [deferwq]
root       110  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [charger_manager]
root       163  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [kpsmoused]
root       164  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [mpt_poll_0]
root       165  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [mpt/0]
root       181  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [scsi_eh_2]
root       189  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [kdmflush]
root       190  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [bioset]
root       191  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [kdmflush]
root       193  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [bioset]
root       208  0.1  0.0      0     0 ?        S    Mar12  10:50  \_ [jbd2/dm-0-8]
root       209  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [ext4-rsv-conver]
root       357  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [ext4-rsv-conver]
root       480  0.0  0.0      0     0 ?        S<   Mar12   0:00  \_ [ttm_swap]
root      1968  0.0  0.0      0     0 ?        S    Mar12   0:00  \_ [kauditd]
root     25942  0.0  0.0      0     0 ?        S<   Mar14   0:16  \_ [kworker/u13:2]
root     12655  0.0  0.0      0     0 ?        S    14:03   0:00  \_ [kworker/1:0]
root     19830  0.4  0.0      0     0 ?        S    19:39   0:05  \_ [kworker/u12:2]
root     20396  0.5  0.0      0     0 ?        S    19:42   0:06  \_ [kworker/u12:3]
root     22090  1.2  0.0      0     0 ?        S    19:51   0:06  \_ [kworker/u12:0]
root     23878  0.0  0.0      0     0 ?        S    19:58   0:00  \_ [kworker/u12:1]
root         1  0.0  0.0  33620  2948 ?        Ss   Mar12   4:55 /sbin/init
root       385  0.0  0.0  19608   920 ?        S    Mar12   0:04 upstart-udev-bridge --daemon
root       394  0.0  0.0  51360  1664 ?        Ss   Mar12   0:01 /lib/systemd/systemd-udevd --daemon
message+   435  0.0  0.0  39228  1324 ?        Ss   Mar12   0:19 dbus-daemon --system --fork
root       475  0.0  0.0  43528  1956 ?        Ss   Mar12   0:21 /lib/systemd/systemd-logind
root       960  0.0  0.0  16200  1432 ?        S    Mar12   0:00 upstart-file-bridge --daemon
root      1152  0.0  0.0  15656   920 ?        S    Mar12   0:00 upstart-socket-bridge --daemon
root      1278  0.0  0.0  15820   952 tty4     Ss+  Mar12   0:00 /sbin/getty -8 38400 tty4
root      1282  0.0  0.0  15820   956 tty5     Ss+  Mar12   0:00 /sbin/getty -8 38400 tty5
root      1287  0.0  0.0  15820   948 tty2     Ss+  Mar12   0:00 /sbin/getty -8 38400 tty2
root      1288  0.0  0.0  15820   956 tty3     Ss+  Mar12   0:00 /sbin/getty -8 38400 tty3
root      1292  0.0  0.0  15820   948 tty6     Ss+  Mar12   0:00 /sbin/getty -8 38400 tty6
root      1310  0.0  0.0  61364  3076 ?        Ss   Mar12   0:00 /usr/sbin/sshd -D
root     23204  0.2  0.0 105632  4320 ?        Ss   19:56   0:00  \_ sshd: root@pts/1
root     23690  0.5  0.0  23084  4256 pts/1    Ss   19:57   0:01      \_ -bash
root     24345  0.0  0.0  18608  1452 pts/1    R+   20:00   0:00          \_ ps auxwf
root      1314  0.0  0.0  17776  1536 ?        Ss   Mar12   2:06 /usr/sbin/dovecot -F -c /etc/dovecot/dovecot.conf
dovecot   1494  0.0  0.0   9284   948 ?        S    Mar12   0:52  \_ dovecot/anvil
root      1495  0.0  0.0   9412  1136 ?        S    Mar12   0:46  \_ dovecot/log
root     30330  0.0  0.0  18812  2304 ?        S    Mar16   1:25  \_ dovecot/config
dovecot  32328  0.1  0.0  16264  1656 ?        S    11:17   0:45  \_ dovecot/auth
1028     27213  0.2  0.0  22672  3964 ?        S    17:34   0:21  \_ dovecot/imap
root     20389  1.3  0.0  29040  2192 ?        S    19:42   0:15  \_ dovecot/auth -w
dovenull 23983  0.2  0.0  18088  2828 ?        S    19:58   0:00  \_ dovecot/pop3-login
lincoln+ 23985  0.1  0.0  21360  2340 ?        S    19:58   0:00  \_ dovecot/pop3
root     24185  1.2  0.0  28808  1996 ?        S    20:00   0:00  \_ dovecot/auth -w
root      1328  0.0  0.0  23656  1056 ?        Ss   Mar12   0:07 cron
daemon    1329  0.0  0.0  19140   160 ?        Ss   Mar12   0:00 atd
root      1333  0.0  0.0   4368   664 ?        Ss   Mar12   0:00 acpid -c /etc/acpi/events -s /var/run/acpid.socket
ntp       1462  0.0  0.0  31444  2132 ?        Ss   Mar12   1:01 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 117:124
mysql     1481  0.9  2.6 2785728 348000 ?      Ssl  Mar12  73:40 /usr/sbin/mysqld
root      1665  0.2  0.0  91772  4640 ?        S    Mar12  16:57 /usr/sbin/vmtoolsd
list      1742  0.0  0.0  60556  9248 ?        Ss   Mar12   0:00 /usr/bin/python /usr/lib/mailman/bin/mailmanctl -s -q start
list      1745  0.0  0.0  60496 11448 ?        S    Mar12   2:34  \_ /usr/bin/python /var/lib/mailman/bin/qrunner --runner=ArchRunner:0:1 -s
list      1746  0.0  0.0  60496 11480 ?        S    Mar12   2:40  \_ /usr/bin/python /var/lib/mailman/bin/qrunner --runner=BounceRunner:0:1 -s
list      1747  0.0  0.0  60500 11452 ?        S    Mar12   2:34  \_ /usr/bin/python /var/lib/mailman/bin/qrunner --runner=CommandRunner:0:1 -s
list      1748  0.0  0.0  60432 11444 ?        S    Mar12   2:34  \_ /usr/bin/python /var/lib/mailman/bin/qrunner --runner=IncomingRunner:0:1 -s
list      1749  0.0  0.0  60452 11520 ?        S    Mar12   2:35  \_ /usr/bin/python /var/lib/mailman/bin/qrunner --runner=NewsRunner:0:1 -s
list      1751  0.0  0.0  60472 11604 ?        S    Mar12   2:37  \_ /usr/bin/python /var/lib/mailman/bin/qrunner --runner=OutgoingRunner:0:1 -s
list      1752  0.0  0.0  60488 11532 ?        S    Mar12   2:32  \_ /usr/bin/python /var/lib/mailman/bin/qrunner --runner=VirginRunner:0:1 -s
list      1754  0.0  0.0  60488 11432 ?        S    Mar12   0:02  \_ /usr/bin/python /var/lib/mailman/bin/qrunner --runner=RetryRunner:0:1 -s
root      1858  0.0  0.0  25344  1696 ?        Ss   Mar12   1:52 /usr/lib/postfix/master
postfix   1950  0.0  0.0  40400  3140 ?        S    Mar12   0:56  \_ tlsmgr -l -t unix -u -c
postfix    611  0.0  0.0  28484  2848 ?        S    Mar12   1:33  \_ qmgr -l -t unix -u
postfix  32297  0.0  0.0  27408  1548 ?        S    08:30   0:09  \_ anvil -l -t unix -u -c
postfix  13376  0.1  0.0  27540  1556 ?        S    19:03   0:05  \_ showq -t unix -u -c
postfix  17377  0.0  0.0  27640  2176 ?        S    19:26   0:02  \_ cleanup -z -t unix -u -c
postfix  18927  0.0  0.0  27456  2160 ?        S    19:36   0:00  \_ local -t unix
postfix  22396  0.0  0.0  27420  1892 ?        S    19:53   0:00  \_ trivial-rewrite -n rewrite -t unix -u -c
postfix  22420  0.0  0.0  27408  1616 ?        S    19:53   0:00  \_ pickup -l -t unix -u -c
postfix  22437  0.0  0.0  27456  2128 ?        S    19:53   0:00  \_ local -t unix
postfix  22975  0.2  0.0  59524  4576 ?        S    19:55   0:00  \_ smtpd -n 173.219.81.61:smtp -t inet -u -c -o stress= -o smtpd_sasl_auth_enable=yes
postfix  23008  0.0  0.0  27456  2124 ?        S    19:56   0:00  \_ local -t unix
root      1907  0.0  0.0  93324  2192 ?        Ss   Mar12   0:53 /usr/sbin/saslauthd -a pam -m /var/spool/postfix/var/run/saslauthd -r -n 5
root      1908  0.0  0.0  93320  2184 ?        S    Mar12   0:52  \_ /usr/sbin/saslauthd -a pam -m /var/spool/postfix/var/run/saslauthd -r -n 5
root      1909  0.0  0.0  93320  2184 ?        S    Mar12   0:54  \_ /usr/sbin/saslauthd -a pam -m /var/spool/postfix/var/run/saslauthd -r -n 5
root      1910  0.0  0.0  93324  2184 ?        S    Mar12   0:55  \_ /usr/sbin/saslauthd -a pam -m /var/spool/postfix/var/run/saslauthd -r -n 5
root      1911  0.0  0.0  93320  2184 ?        S    Mar12   0:52  \_ /usr/sbin/saslauthd -a pam -m /var/spool/postfix/var/run/saslauthd -r -n 5
root      1993  0.0  0.1  76676 18972 ?        Ss   Mar12   0:14 /usr/bin/perl /usr/share/usermin/miniserv.pl /etc/usermin/miniserv.conf
clamav    4017  0.3  2.4 1066144 333284 ?      Ssl  Mar12  22:50 /usr/sbin/clamd
bind      4131  2.1  0.5 559868 79124 ?        Ssl  Mar12 164:24 /usr/sbin/named -u bind
root      4150  0.0  0.4  91516 62832 ?        Ss   Mar12   3:23 /usr/share/webmin/virtual-server/lookup-domain-daemon.pl
root      4170  0.0  0.4 137520 64492 ?        Ss   Mar12   5:44 /usr/sbin/spamd --create-prefs --max-children 5 --helper-home-dir -d --pidfile=/var/run/spamd.pid
root       962 21.9  0.6 153964 81508 ?        S    18:02  25:54  \_ spamd child
root      1422  8.2  0.5 151600 79212 ?        S    18:05   9:34  \_ spamd child
root      4393  0.0  0.5 141544 73836 ?        Ss   Mar12   2:15 /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/miniserv.conf
root     22462  0.1  0.5 141544 73836 ?        S    19:53   0:00  \_ /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/miniserv.conf
root     23110  0.1  0.5 141544 73832 ?        S    19:56   0:00  \_ /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/miniserv.conf
root     23623  0.1  0.5 141544 73836 ?        S    19:57   0:00  \_ /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/miniserv.conf
root     23624  0.1  0.5 141544 73768 ?        S    19:57   0:00  \_ /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/miniserv.conf
root     23947  0.1  0.5 141544 73836 ?        S    19:58   0:00  \_ /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/miniserv.conf
root     24081  0.1  0.5 141544 73772 ?        S    19:59   0:00  \_ /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/miniserv.conf
root      4410  0.0  0.0  15820   952 tty1     Ss+  Mar12   0:00 /sbin/getty -8 38400 tty1
root      8697  0.0  0.1 434740 25060 ?        Ss   Mar12   1:01 /usr/sbin/apache2 -k start
www-data 30765  0.0  0.0 434680 10896 ?        S    Mar15   0:00  \_ /usr/sbin/apache2 -k start
www-data 30820  0.0  0.0 241604  9328 ?        S    Mar15   0:22  \_ /usr/sbin/apache2 -k start
usamaf   30830  0.2  0.3 340976 42524 ?        S    Mar15   7:28  |   \_ /usr/bin/php5-cgi
otakuan+  5008  0.0  0.1 327880 16720 ?        S    Mar15   0:00  |   \_ /usr/bin/php5-cgi
usamaf   28613  0.1  0.2 335480 34564 ?        S    Mar15   4:30  |   \_ /usr/bin/php5-cgi
mthopef+  4064  0.0  0.2 333644 27572 ?        S    05:03   0:34  |   \_ /usr/bin/php5-cgi
idsnetw+  4113  0.2  0.2 331444 36316 ?        S    11:32   1:21  |   \_ /usr/bin/php5-cgi
wvkarate 11088  0.3  0.2 331328 31952 ?        S    13:57   1:22  |   \_ /usr/bin/php5-cgi
fastfoo+ 15738  0.0  0.1 328444 18292 ?        S    14:14   0:15  |   \_ /usr/bin/php5-cgi
usamaf   20031  0.4  0.2 334200 30896 ?        S    14:27   1:38  |   \_ /usr/bin/php5-cgi
1113     24500  2.2  0.2 329876 36224 ?        S    14:50   6:54  |   \_ /usr/bin/php5-cgi
tomscot+ 12815  0.9  0.3 338796 47468 ?        S    16:34   1:57  |   \_ /usr/bin/php5-cgi
idsnetw+ 18042  0.4  0.2 337972 28496 ?        S    17:00   0:44  |   \_ /usr/bin/php5-cgi
idsnetw+ 18045  0.5  0.2 337196 28004 ?        S    17:00   0:54  |   \_ /usr/bin/php5-cgi
1113      2167  2.2  0.2 332460 38792 ?        S    18:07   2:31  |   \_ /usr/bin/php5-cgi
planodo+  8880  3.8  0.4 341284 57456 ?        S    18:39   3:06  |   \_ /usr/bin/php5-cgi
1006     11625  4.3  0.2 330028 29144 ?        S    18:55   2:48  |   \_ /usr/bin/php5-cgi
1006     11635  7.1  0.2 330020 29092 ?        R    18:55   4:38  |   \_ /usr/bin/php5-cgi
1006     11638  4.4  0.2 330020 29048 ?        S    18:55   2:54  |   \_ /usr/bin/php5-cgi
www-data 18878  0.0  0.0 435512 12764 ?        S    17:03   0:07  \_ /usr/sbin/apache2 -k start
www-data 25003  0.0  0.0 435376 11932 ?        S    17:26   0:06  \_ /usr/sbin/apache2 -k start
www-data 31694  0.0  0.0 435400 12680 ?        S    17:55   0:03  \_ /usr/sbin/apache2 -k start
www-data 31696  0.0  0.0 435452 11976 ?        S    17:55   0:03  \_ /usr/sbin/apache2 -k start
www-data 31697  0.0  0.0 435376 11916 ?        S    17:55   0:04  \_ /usr/sbin/apache2 -k start
www-data 10812  0.0  0.0 435360 11904 ?        S    18:53   0:02  \_ /usr/sbin/apache2 -k start
www-data 10813  0.0  0.0 435320 12636 ?        S    18:53   0:02  \_ /usr/sbin/apache2 -k start
www-data 10820  0.0  0.0 435424 12652 ?        S    18:53   0:02  \_ /usr/sbin/apache2 -k start
www-data 10822  0.0  0.0 435328 11868 ?        S    18:53   0:02  \_ /usr/sbin/apache2 -k start
www-data 10823  0.0  0.0 435364 11888 ?        S    18:53   0:02  \_ /usr/sbin/apache2 -k start
www-data 10825  0.0  0.0 435368 11904 ?        S    18:53   0:02  \_ /usr/sbin/apache2 -k start
www-data 10847  0.0  0.0 435352 11888 ?        S    18:53   0:02  \_ /usr/sbin/apache2 -k start
www-data 10853  0.0  0.0 435344 11868 ?        S    18:53   0:02  \_ /usr/sbin/apache2 -k start
www-data 19856  0.0  0.0 435276 11632 ?        S    19:39   0:00  \_ /usr/sbin/apache2 -k start
www-data 22323  0.0  0.0 435276 11628 ?        S    19:52   0:00  \_ /usr/sbin/apache2 -k start
www-data 22324  0.0  0.0 435224 11636 ?        S    19:52   0:00  \_ /usr/sbin/apache2 -k start
www-data 22326  0.0  0.0 434820 10876 ?        S    19:52   0:00  \_ /usr/sbin/apache2 -k start
www-data 22327  0.0  0.0 435276 11532 ?        S    19:52   0:00  \_ /usr/sbin/apache2 -k start
www-data 22328  0.0  0.0 435276 11400 ?        S    19:52   0:00  \_ /usr/sbin/apache2 -k start
www-data 22329  0.0  0.0 435324 11692 ?        S    19:52   0:00  \_ /usr/sbin/apache2 -k start
proftpd  30651  0.0  0.0 113904  2480 ?        Ss   Mar15   0:31 proftpd: (accepting connections)
syslog    3290  0.1  0.1 256040 14680 ?        Ssl  Mar15   4:07 rsyslogd
root     23784  0.1  0.0  17160  4888 ?        S<L  00:00   1:13 /usr/bin/atop -a -w /var/log/atop/atop_20150317 600
root     23804  0.4  0.1  65628 20308 ?        Ss   00:00   5:31 lfd - sleeping
root     24342  110  0.1  66156 19912 ?        R    20:00   0:01  \_ lfd - (child) process tracking...
root     24079  7.9  0.1 127420 20456 ?        S    19:59   0:04 /usr/bin/perl -w ./clean_graph.pl 60 cpu

mailq | tail -1

-- 795 Kbytes in 54 Requests.

(Honestly I monitor the mailq via historical stats and it rarely exceeds 100 at any one time)

netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -10

     7 173.80.16.220
      4 127.0.0.1
      2
      1 servers)
      1 Address
      1 76.9.85.41
      1 68.180.228.90
      1 209.85.160.140
      1 207.236.147.203
      1 174.22.182.197

As you can see there is no real reason for this massive slowdown but it hits me about 1/week or so it slows down and stays crazy slow on all websites for about 10 hours + then it goes back to normal for a week or two then back to hammered.

The only thing I see is a lot of name lookups and /usr/sbin/named -u bind taking up a lot of time but I'm not sure otherwise, I see root 24342 110 0.1 66156 19912 ? R 20:00 0:01 _ lfd - (child) process tracking... so is LFD going crazy? I really like LFD and CF for the autobanning features because I see hacking attacks and DOS all the time from the script kiddies

Thanks for any input I am at a loss this server and connection should not be having this issue there must be a misconfiguration somewhere.

I eneabled grey listing and everything ironed out, maybe its coincidence?

Dont know.

Tue, 03/17/2015 - 23:02
andreychek

Yeah I don't see anything too unusual in that process list there... whenever you ran that, were you experiencing the problem you're describing?

If it comes up again, you could always try disabling lfd, or any other process, just to make sure that isn't related.

-Eric

Wed, 03/18/2015 - 12:05 (Reply to #5)
Shinzan

I was having the problem then i enabled grey listing and within a few minutes I was back to normal, it may have been a coincidence, Ill know in a week because it generally happens every 5-10 days.

David

Tue, 04/07/2015 - 16:56
Shinzan

This is bad, I am not going to be able to use this system in production if i can't solve this problem. Again today the problem is a catastrophic slowdown. Any thoughts?

This is 2 weeks from the last slowdown. The small daily spikes are the midnight backups, the 2 HUGE spikes are .... well i dont know the problem comes and goes...

Seems there could be a correlation between mailq and my problem

Tue, 04/07/2015 - 16:58 (Reply to #7)
Shinzan

2nd view of CPU

Tue, 04/07/2015 - 18:21
andreychek

Howdy,

You mentioned a correlation between this CPU issue and mailq... whenever this occurs, how many messages are showing up in your email queue?

-Eric

Tue, 04/14/2015 - 21:54 (Reply to #9)
Shinzan

well it seemed like there was a correlation but only 35 were in there but then i deleted the whole queue and it was still slammed, I rebooted and it ironed itsself out but it was so bad stats on my server didn't even record for several hours, I have a big white gap in my stats for that time period where it was very bad, it seems like this just happens every 2 weeks or so, very strage, I'm trying to figure this out before i put two of my major clients websites on this box any help is appretiated.

David

Mon, 04/20/2015 - 20:38
andreychek

I'm having trouble remembering what all we tried -- but it might be worth verifying that in Email Messages -> Spam and Virus Scanning, we'd recommend setting "SpamAssassin client program" to "spamc (Client for SpamAssassin filter server spamd)", and "Virus scanning program" to "Server scanner (clamdscan)".

Those settings can each make a pretty big difference, if they aren't already set that way.

-Eric

Wed, 04/22/2015 - 08:57 (Reply to #11)
Shinzan

Thanks for the reply, I did change spamassassin over to spamc and clamscan to server scanner, so far I'm

System uptime 13 days, 19 hours, 19 minutes Running processes 230
CPU load averages 0.06 (1 min) 0.15 (5 mins) 0.13 (15 mins)

With no issues then out of the blue it will get hammered for a day. When it happens what type of log files should i be looking in do you think? The server is 16gb ram 6 cores dual processor so it should eat 17 websites and email for breakfast.

David

Tue, 07/19/2016 - 09:37
Shinzan

So after starting a support ticket the folks at Virtualmin really helped me figure it out.

Its CSF/LFD the monitoring of processes were firing off so fast that it basically started a denial of service on myself!

Long story short disable LFD, delete all the thousands of emails its dumping to root, and modify the CSF config files to increase the limits of processes to make it not monitor the threshholds too low or change the reporting interval because once the chain starts it just exponentially gets worse dragging down the whole server!

Topic locked