Webmin connection failure

Seems I spoke too soon, unless this is another issue. MySQL service seemed to stop running (Wordpress sites failed to connect to database). I couldn't connect to Webmin either so I thought I would do a reboot. However the server didn't boot but got stuck on "fsck exited with status code 4". A quick Google showed me how to manually run fsck, which I did and after quite a few errors, the system started again but the wordpress sites again failed to connect to their databases. I logged in via PuTTY and ran "service mysql restart" and everything started working again. I could log into webmin again too. My question is what logs do I look into to see what may have caused the problem?

Status: 
Active

Comments

Howdy -- unfortunately, you could be seeing the beginning of a hard disk failure. That could cause all the problems you're experiencing.

Is this a VPS, or dedicated server? If it's a dedicated server, is it using RAID?

Also, what errors did you run into while running the fsck?

Lastly, what is the output of this command:

dmesg | tail -30

Hi,

No it's a VPS (in that it's a virtual machine under hyper-v). There are two other VM's on the same host, a windows VM and a another Linux VM (the original Ubuntu server). Neither of these are displaying the same issue though. The fsck errors are to numerous to list (plus I can't remember them, I just went yes to fix them), is there a log I can access for that? Haven't rebooted the machine for a while but things went funny after the webmin update.

Output below

[ 6.431063] systemd-udevd[208]: starting version 215 [ 7.501616] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3 [ 7.501625] ACPI: Power Button [PWRF] [ 7.577334] hv_vmbus: registering driver hyperv_fb [ 7.578257] hyperv_fb: Screen resolution: 1152x864, Color depth: 32 [ 7.584749] Console: switching to colour frame buffer device 144x54 [ 7.686099] input: PC Speaker as /devices/platform/pcspkr/input/input4 [ 7.688612] hv_utils: Registering HyperV Utility Driver [ 7.688616] hv_vmbus: registering driver hv_util [ 7.690291] piix4_smbus 0000:00:07.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr [ 7.695039] hv_vmbus: registering driver hyperv_keyboard [ 7.695682] input: AT Translated Set 2 keyboard as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/vmbus_0_4/serio2/input/input5 [ 8.020536] psmouse serio1: alps: Unknown ALPS touchpad: E7=12 00 64, EC=12 00 64 [ 8.224139] psmouse serio1: trackpoint: failed to get extended button data [ 9.004617] Adding 2170876k swap on /dev/sda5. Priority:-1 extents:1 across:2170876k FS [ 9.039823] AVX version of gcm_enc/dec engaged. [ 9.044719] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni) [ 9.068024] alg: No test for crc32 (crc32-pclmul) [ 9.196447] EXT4-fs (sda1): re-mounted. Opts: data=ordered,grpquota,errors=remount-ro,usrquota [ 12.304179] psmouse serio1: trackpoint: IBM TrackPoint firmware: 0x01, buttons: 0/0 [ 12.305880] input: TPPS/2 IBM TrackPoint as /devices/platform/i8042/serio1/input/input6 [ 15.204052] systemd-journald[206]: Received request to flush runtime journal from PID 1 [ 21.864197] RPC: Registered named UNIX socket transport module. [ 21.864201] RPC: Registered udp transport module. [ 21.864202] RPC: Registered tcp transport module. [ 21.864203] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 22.074266] FS-Cache: Loaded [ 22.128345] FS-Cache: Netfs 'nfs' registered for caching [ 22.433884] Installing knfsd (copyright (C) 1996 okir@monad.swb.de). [ 7711.963266] traps: php5-cgi[14587] general protection ip:70e439 sp:7ffcb361d570 error:0 in php5-cgi[400000+7e1000]

Unfortunately, fsck doesn't log any errors.

If you run it again and see errors, let us know some of the ones you see there.

As far as log files go, you could try looking at /var/log/syslog, /var/log/kern.log, and /var/webmin/miniserv.error.

You may also want to keep an eye on that dmest output to see if it shows errors in the future.

What kind of VPS is it that you have there? Is it OpenVZ? If so, could you share your /proc/user_beancounters file?

It's a Hyper-V Virtual Machine. Full blown server running Debian Linux, but in a virtualised environment (so maybe it isn't a VPS). I've just checked the Host machine's event viewer and noticed it complaining that the Integrated Services on the server (host) is 6 where as the client is 5.1. There doesn't appear to be a way to fix this from another Google search as the Linux OS is supposed to have LIS built in. SO IU'm not sure if this is the problem or not. Ubuntu is OK and so maybe that's the reason it reboots fine. Might have to find a forum somewhere that can help with Hyper-V integration if this is the problem and it continues.

Ah, you did indeed mention hyper-v above.

I'd definitely suggest reviewing the logs above, looking back to roughly the time you began having some recent problems. In particular, the logs "syslog" and "kern.log" might have the best info.

Also, you may want to make sure you're using the most recent packages -- and in particular, make sure you're using the most recent kernel version for your distro.

Mmmm. The trouble was apparent when I arrived at work at around 8. Snippet from syslog below.

Oct 11 19:55:01 debian CRON[6956]: (root) CMD (/etc/webmin/status/monitor.pl) Oct 12 08:32:35 debian rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="672" x-info="http://www.rsyslog.com"] start

We jump from 19:55 on the 11th to 8:32 on the 12th. The 8:32 time is after the problem was fixed using a manual fsck.

A snippet from "kern.log".

Oct 9 22:02:53 debian kernel: [1571658.828657] sd 0:0:0:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automatically adjust these parameters. Oct 10 22:40:13 debian kernel: [1660298.841532] traps: php5-cgi[61349] general protection ip:70e439 sp:7ffc2a235180 error:0 in php5-cgi[400000+7e1000] Oct 12 08:32:35 debian kernel: [ 0.000000] Initializing cgroup subsys cpuset

I can't see anything obvious but my knowledge is pretty limited.

Ran uname -a and that returned debian 3.16.0-4 which looks like the latest kernel. Maybe it was just a hold over from the webmin update. If there are any other logs you can think of that may help let me know.

Thanks for your help by the way.

Well, your server seems to be seeing some pretty significant low level issues, at the filesystem level or below.

This error here is concerning:

Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automatically adjust these parameter

That seems to indicate that the Linux kernel thinks something "underneath" it has changed, such as what could happen if a VPS setting was modified. But if that error occurs regularly, it may indicate a kernel bug or VPS bug of some sort.

That looks a bit like this particular issue here:

https://social.technet.microsoft.com/Forums/windowsserver/en-US/8807f61c...

Is there anything else "heavy" that's running at the time when this occurs? Such as backups of some sort?

If it happens again, what I'd suggest doing is to run the "mount" command.

The users in the above forum suggested their filesystem was made read-only when that error occurred... if that's the case in your situation, that may explain why things like MySQL stopped working.

Below are some snippets from webmin miniserv.error logs

[11/Oct/2015:15:33:56 +1100] [121.214.130.40] Document follows : This web server is running in SSL mode. Try the URL https://debian.host.net.au:10000/ instead.
Error: Server no longer exists! Warning: something's wrong at /usr/share/webmin/authentic-theme/authentic.pl line 8.

Mind you I get the same error message from Ubuntu except for the Error: Server no longer exist! That's a little different but maybe that was at the time webmin shutdown after the update.

I looked at the Microsoft blog and the problems reported there were seemingly solved by doing a reboot. The difference with our server is that a reboot actually seemed to spark the issue. We backup every day (well every night using Altaro backup software), and backup three VM's, one a windows VM and two Linux VM's. The Debian one was the problem machine. But have been backing up these machines without fail for the last few weeks. Were there any significant changes with webmin being updated that may have affected the Debian install? That was the only change that I can see. There were a couple of package updates but nothing significant last week. Debian is pretty stable and needs infrequent updating, which is one reason I chose it over other distros.

If it happens again I may have to revert to Ubuntu or CentOS for backup reasons.

The issue wouldn't be related to Webmin or other services user-facing services.

Your server seems to be experiencing some pretty low level issues.

I'd be looking at lower level things. On the host, there is hyper-v itself and hardware, but since the other VM's aren't experiencing an issue, those don't stand out to me either.

On your Debian server, I'm suspicious of the kernel. But I'm also curious what other packages on there were updated recently, as there could possibly be other low-level tools in use there.

While we normally recommend against this sort of thing for stability reasons, since you're already experiencing stability issues, you might consider looking into whether moving to a non-standard kernel might be worth a try.

For example, perhaps the kernel from Debian Testing.

Now, you'd want to be able to revert back to this current setup if that doesn't work, so if you choose to do that, I'd highly recommend having an excellent backup of your entire server. But if you get stuck and just aren't sure what else to change -- I personally might try that, before migrating everything to a new distro.

However, I might wait until you see this issue again, and if you do, see if those same errors show up in the syslog file. Also, use the "mount" command to see if the filesystem is in readonly mode.

That will help us determine if you're experiencing the issue in the link I shared above.

Well all good this morning. Server was backed up last night using Altaro, as well as the internal backup though webmin. I can log in to webmin without issues and all websites seem to be working ok.

Pasted below is the output from mount

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) udev on /dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=633729,mode=755) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) tmpfs on /run type tmpfs (rw,nosuid,relatime,size=1017300k,mode=755) /dev/sda1 on / type ext4 (rw,relatime,quota,usrquota,grpquota,errors=remount-ro,data=ordered) securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd) pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=23,pgrp=1,timeout=300,minproto=5,maxproto=5,direct) mqueue on /dev/mqueue type mqueue (rw,relatime) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) debugfs on /sys/kernel/debug type debugfs (rw,relatime) rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw,relatime)

I'd be interested in the output of "mount" only when experiencing a problem, that will show if the filesystem is in read-only mode.

You could also determine that by running "dmesg | tail -30" most likely.

Well it happened again but this time I did a screen capture of the mount command. However when I look before the mount command it seems the server actually runs out of memory. This is odd as most of the time it runs at around 50% usage. I would notice if the memory was fast depleting as I check daily.
After running the mount command I also ran fsck /dev/sda1 to repair the drive. Sorry I didn't take note of the fixes but there was a heap of them. I didn't reboot the server before I did all this so not sure if the out of memory issue would have resolved itself doing the reboot. Would memory issues cause corruption of files on the drive? Perhaps this has been the problem all along. The server has 5Gb of memory assigned to it, but I'm not even sure that this is the problem. I can send through the image of the terminal output, just can't see anything obvious where I can attach it.

Sorry to hear you're still experiencing this issue!

Running out of memory would not cause filesystem issues though. That would cause processes to be killed off by the Linux kernel, and would prevent their usage until they are restarted. It wouldn't cause filesystem corruption though.

And yeah, 5GB of RAM should be more than plenty.

If you run out of RAM again, you may want to review your running processes to see what's eating up all the memory. You can do that with "ps auxw". While that's a different issue than what's causing the filesystem corruption, it would be good to sort out what's causing that.

Lastly -- attachments may not work at the moment, unfortunately, though you could always upload the image somewhere else, including on your server, and give us a link to that.

Thanks for getting back. The system doesn't notify me anymore when there is a response. Link to the screen grab is below.

Debian_Mount_Output_13-3-2016.png

Hopefully this will give some light to what's happening. Thanks again.

Yeah it looks like the filesystem is indeed going into read-only mode, due to filesystem errors.

That doesn't identify what exactly the issue is, but it does mean that you're seeing some disk level issues.

I was previously suggesting you try a newer kernel on your Linux system... I would definitely suggest that still. However, the other thought I had, are there updates to Hyper-V available? It wouldn't hurt to try updating that as well.

Interestingly enough we also have an Ubuntu VM that never shows this behaviour, but it does get rebooted quite often for updates. The Debian VM usually never needs to reboot (except when this happens) and this lock up usually occurs after 60 days or so of uptime. I'm thinking that every month during windows updates (which always require a reboot) I could just make it a practice to reboot the Debian server too and see if the problem presents itself again.

But having said all that, just had the server nearly run out of virtual memory. There was still plenty of ram left though, close to 60%. I will add that the symptoms prior to an emergency reboot were database connectivity issues with Wordpress sites. The reboot settled things down though. Could this be a MySQL issue?

As for Hyper-V on the host server, it is always updated so I would assume Hyper-V is updated also during these monthly patches too. I noticed today that there are a few optional updates (about 24 of them) that were posted on the 15th of this month.

Unfortunately, you'd have to look while the RAM is being used, in order to determine what's using it. After the problem has stopped, there's nothing that would show what was using it.

However, what you could always do is setup monitoring to notify you when there's a problem... and when that occurs, you could take a look and review the situation.

One way to do that would be to use Webmin -> Others -> System and Server Status. There, you can setup various monitors, including RAM monitoring.

However, I don't think the issue you're seeing with filesystem corruption is RAM related -- but it would also be good to resolve the RAM usage.

-Eric

After the reboot virtual ram has steadily increased, real memory usage is 42%, virtual memory usage is 48%. Virtual server count is 63. The Ubuntu server on the other hand is running at 37% real memory, 1% virtual memory and has 28 virtual servers running. When you mentioned the kernel I noticed that Ubuntu is running Linux 3.16.0-67-generic on x86_64 where as Debian is using Linux 3.16.0-4-amd64 on x86_64. So it looks like Ubuntu is using a later kernel (unless I'm not understanding the kernel numbering system).

So maybe I should be using a later kernel. Just unsure how to go about it.

Well, I'm just taking a bit of a guess here.

All I can say for certain is, if you're seeing filesystem errors, something very wrong is occurring, at a low level.

That's different than the RAM issues you are seeing. If RAM usage is going up, you would want to monitor what processes are using more RAM.

But as far as filesystem errors goes -- you're looking at low-level problems there. Is there a hyper-v update available? If so, that might be an easier place to start.

They are optional updates and the hyper-v ones seem more to do with a Windows VM. I keep wondering why Ubuntu doesn't suffer from the same problem if it's a Hyper-V issue. What I can do if it happens again is copy the Debian VM first before fixing it and then maybe run it up with out network connectivity to see what file system errors there are. If I fix the original VM then I can keep customers happy and investigate the other one to see what the problem maybe.

As for the memory issue, I'm running htop in putty at the moment but can't make head nor tail of some of the figures. Is there a command that gives me application usage or something similar in webmin that can do the same?

Don't worry just found it :)

PID 1234 mysql 1.39 GB.

Not sure if it's swap that it's using. Is there a way to check this please? And if it is, is there a way to get it use real memory instead?

Well, there isn't a simple way to see what process is using swap.

Though if the goal is to lower swap usage, it might be best just to lower RAM usage in general.

What is the output of these two commands:

free -m
ps auxwf

Again swap got higher than 60% before I finished up last night. As I was concerned it would run out of virtual memory over night, I thought about the website I was working on when all this started. So as a test I disabled it and then enabled it again. Almost instantly swap went down to a respectable 20% and memory usage down to around 30%. However it all started to climb again but after 8 or so hours had gotten to around 52% virtual memory and memory to 40%. So this morning I disabled and enabled it again (about an hour and half ago) and the same thing happened. Almost instant drop again and we are now running at 42% memory usage and 29% virtual memory usage.

Have you ever heard of something like this happening? Is it possible for a website to cause these issues?

Output from free -m

             total       used       free     shared    buffers     cached
Mem:          4967       4782        184       2425         31       2653
-/+ buffers/cache:       2097       2869
Swap:          2119        644       1475

The output from ps auxwf is fairly large. Would you prefer that in a text file?

Still unsure what is going on here with virtual memory. However there were 52 updates yesterday that I installed and overnight Virtual Mem stayed at around 40 - 45%. During the day it has steadily increased to 77%. Not sure why unless it's MySQL?

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND mysql 57351 0.7 2.8 2651976 143076 ? Sl Apr01 41:32 \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plug

Should I start a new ticket for this? Any help appreciated.

The output I'm seeing above isn't too unusual.

Of course if you're running out of swap that is unusual. But since the above looks okay, it would be difficult to say what exactly is causing the problem, we'd need to see it when the RAM usage is more of a problem.

However, it can't hurt to tweak MySQL to lower how much RAM it's using.

A simple way to do that would be to re-run the setup wizard (in System Settings). When doing that, try choosing one of the lower memory options for MySQL.

Thanks for that. Did about 52 updates yesterday for the Debian server. Virtual usage started to rise again so I did the usual trick of disabling a virtual server and enabling it again. That dropped it all back to respectable levels. I didn't check last night, but did this morning and I'm happy to report that Virtual memory is running at 29% and memory is 43%. Maybe there was an issue that was fixed in yesterdays patches. Will keep an eye on it and see how it goes. Will also put into practice my new policy of rebooting the server during windows updates and see if the file system error returns. If it doesn't then that one can be crossed of the list too.

I just wanted to follow up, are things still looking good on your server there?

Virtual memory is still going up and down, but hasn't maxed out like the other day. It's at 69% at present. Perhaps the problem stems from the fact that I didn't assigned enough virtual memory when I set the server up. Ubuntu has roughly the same amount of virtual memory as it has ram and it's virtual memory is running at 0%,

When I set Debian up it defaulted to about half. However memory usage always drops significantly whenever I disable a virtual server, not sure what processes are happening when that occurs though. But I will edit this to qualify the statement, not every domain disabling will drop virtual memory.

Just had to restart webmin. Virtual memory was getting high at the time however it wasn't catastrophic. Have just gone through the logs for around that period of time and copied and pasted into notepad. There is quite a bit in there about the kernel and hyper v stuff too. Have uploaded the text file for you to look at as I can't see if there's an issue or not.

https://www.rdweb.com.au/mem_issue_15-4-2016.txt

Not sure if this is related to the corruption that occurs once every few months. Was about to ask the question about uninstalling clamav as it's mentioned a couple of times and that question was, as the server doesn't handle email, can it be safely removed? That would free up a little bit of memory too.

Also need to add that this morning I rebooted the host machine to install the latest server updates. There were quite a few, around 32 in total.

Thanks for the help.

Ah if you aren't using email, you may want to disable the ClamAV and SpamAssassin services. Both of those can use a decent amount of RAM.

I wouldn't remove them altogether, but disabling the service should do the trick.

The text file you shared does suggest that you're experiencing low memory situations

Any tips how to permanently disable both. The only one that shows up under status is spamassassin which is already marked as disabled, but clam isn't listed. If it isn't running, any idea why it consumes around 600MB as shown in the processes list?

If the feature is disabled in Virtualmin, that doesn't necessarily mean it's not running, it just means Virtualmin isn't allowed to use it.

To disable a service, you can go into Webmin -> System -> Bootup and Shutdown, and there you can see the services that are running, and you can optionally disable them.

Did that and it effectively shut it down. However when I refresh the page it shows clamav-daemon and clamav-daemon.socket as both starting at boot. Can't seem to stop from starting at boot (even thought they aren't running). Also Bind is back too. it's consuming around 400MB. Can I do the disable thing to it as well? These Linux boxes don't handle DNS, that's still handled by the Windows server.

By default, BIND is used for all DNS lookups.

However, if you edit /etc/resolv.conf and change the nameserver there to use your ISP's nameservers, it would then be possible to disable it.

For anything you're disabling, you'd want to make sure you're selecting the disable now and on boot option on the Bootup and Shutdown screen.

Thanks for that. BIND isn't running on the ubuntu server so I assume that disabling it on debian shouldn't have any effect either. Having said all that I seem to recall that BIND was disabled earlier, see this thread. However it is still running as a process after reboots. I don't think the server uses it as it isn't ticked under System Settings>Features and Plugins. Is it safe then to stop and not have it start up after a reboot?

The other thing is that clamav-daemon, clamav-daemon.socket continually stay as "Yes" for start at boot, despite numerous attempts of selecting and clicking "Disable Now and On Boot". So I tried clicking on the service and opening up the page "Edit Systemd Service for clamav-daemon.socket, selecting Start at boot time=No and hitting save. That throws me to an error page that says "Failed to save systemd service : No systemd configuration entered" message and I have to return to previous page. But if I go back to "Bootup and Shutdown" the service is still listed as start at boot = yes. Maybe I need to add something to systemd to get it to run but am unsure. Any ideas?

Thanks again Roger

Whether or not Virtualmin is using BIND -- you'd definitely want to ensure that 127.0.0.1 isn't listed as a nameserver in /etc/resolv.conf. That can be listed there, even if Virtualmin isn't using BIND. If that's in there, that would cause DNS lookups to fail if BIND were stopped.

Have done that thank you and disabled BIND from running now and at reboot. CLAM on the other hand is still persistent in wanting to started at boot. Any ideas as to what may be causing this? Again your help is appreciated.

Should I start another ticket for this issue? Seems we are going off track a little.

Roger

Hmm, what distro/version is it that you're using there? (my apologies if I asked that already, though I didn't see it above at a glance)

Debian Linux 8.

I have setup monitoring as you suggested earlier. Last night I received two emails, the first one said "Monitor on debian.host.net.au for 'MySQL Database Server' has detected that the service has gone down at 22/Apr/2016 23:30". The second email came shortly after "Monitor on debian.host.net.au for 'MySQL Database Server' has detected that the service has gone back up at 22/Apr/2016 23:35". The Server logs show at that time the below entry.

Apr 22 23:30:27 debian kernel: [1947991.864393] mysqld invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 Apr 22 23:30:27 debian kernel: [1947991.864397] mysqld cpuset=/ mems_allowed=0 Apr 22 23:30:27 debian kernel: [1947991.864402] CPU: 3 PID: 61834 Comm: mysqld Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt20-1+deb8u4 Apr 22 23:30:27 debian kernel: [1947991.864624] [30702] 0 30702 1084 1 7 41 0 mysqld_safe Apr 22 23:30:27 debian kernel: [1947991.864626] [50350] 108 50350 662419 17937 241 52565 0 mysqld Apr 22 23:30:27 debian kernel: [1947991.865017] Out of memory: Kill process 50350 (mysqld) score 38 or sacrifice child Apr 22 23:30:27 debian kernel: [1947991.865049] Killed process 50350 (mysqld) total-vm:2649676kB, anon-rss:71748kB, file-rss:0kB

Is it correct to assume that MySQL is the culprit that is consuming Virtual Memory? If so do you know how I can force it to use RAM over Virtual Memory or a way I can increase Swap on a running server?

Thanks

Usually when a process is killed for the system being out of memory, it is the culprit - as Linux always kills the largest process.

If you have swap configured, that will be used automatically as a fallback.

Thanks Jamie. I assumed that swap is virtual memory, but I may be wrong. Can you enlighten me please? Also when I setup the server it defaulted to 2GB of virtual memory, perhaps I need to increase that amount. Is there a way of doing that on an already running server?

Thanks again.

That's correct, swap is virtual memory.

I'm surprised your server is having so much trouble with RAM, you have over 8GB including swap.

What you could always try is to go into System Settings, and re-run the setup wizard.

In there is an option to specify which MySQL config to use. You may want to try setting it to use the smallest MySQL config, which will significantly reduce how much RAM it uses.

After that, restart MySQL, and see if that helps with your issue.

Yep I re ran the installer a while back when you suggested it but didn't restart MySQL. Think I have had a reboot since then too. When I re-ran the installer I selected 1GB for MySQL however last night it was running at 2.3GB. I'm thinking that I need to increase the size of the swap partition or alternatively setup a new server with a much larger swap and move everyone over to that server. I could run that as a GPL and when I'm done switch it over to the Pro version. Any thoughts please.

Needing more than 2GB of swap, when you already have 6GB of RAM, seems like quite a bit. It really shouldn't need anything near that... and in general I'd suggest we try to configure your server so that it doesn't feel the need to use as much swap.

I mean, you could certainly do that, but in my opinion I'd work to figure out what's going on to cause it to need so much RAM.

You really may want to try the smallest MySQL size, at least for the time being (and then yeah, you may need to manually restart MySQL afterwards).

Another thing you could try is to edit /etc/apache2/apache2.conf, and set "MaxClients" to something smaller... it generally defaults to 150 or more, you might want to try setting it to say, 30 or 40 just for the moment (and then restart Apache).

That will make it so that a sudden flood of traffic can't bring your server down.

After making those changes and restarting the services (or rebooting), could you share what "ps auxf" shows?

It seems weird, but what ever is using swap is not using RAM as it seems to hover around the 35% to 45%. Swap goes up and down like a yo yo. Will reducing RAM for MySQL put more load onto the processor? Also tried to find "MaxClients" in apache2.conf but can't find it.

MySQL is running at 1.21GB at the moment, steadily increasing along with swap.

Real memory: 4.85 GB total / 2.88 GB free / 2.63 GB cached Swap space: 2.07 GB total / 884.90 MB free

Ok just re ran the installer. MySQL is running at 714MB. Real memory: 4.85 GB total / 2.78 GB free / 2.62 GB cached Swap space: 2.07 GB total / 864.32 MB free.

Swap is still high even though MySQL has dropped by about 500MB.

ps auxf output

That's not abnormal to see something using swap in that case -- it just means something that was previously using swap is still using that.

I think changing MySQL for the time being is the best thing to do until we're sure you aren't running low on memory.

Now, it is going to grow, but it should slow down at some point.

From here, we just need to review what else is using RAM.

Which version of Debian is it you're using there? That will help me understand what your Apache config should look like so we can modify it to allow less connections.

Also, could you run the "ps auxwf" again and share that?

Just as a comparison Ubuntu is running as below

Real memory: 4.84 GB total / 3.50 GB free / 2.10 GB cached Swap space: 5 GB total / 4.97 GB free

MySQL is 2.60GB.

There's a lot of variables that contribute to how much RAM is used by a server. It's not surprising that another system would be using a much different amount of RAM though.

We'll need to see the output to the "ps" command above, as well as the exact Debian version, to offer some additional input on all that.

Hi Eric,

It's just above. But if you need the link again then please see below.

ps auxf output

Also the server is running Debian Jessie (8).

Thanks

Ah, my bad, looks like I missed your link!

Okay, so MySQL is definitely looking good there. It's not actually using that full 700MB, some of that is shared. It may really be in the 500's.

It looks like pulsewayd is using a decent amount of RAM (though, again, it should be okay, and having 6GB of RAM should be plenty for you to use tools like it).

Are you using Mailman? If not, you may want to disable that service, and then shut off the feature in Virtualmin.

However, after reviewing what you have there, I do think I'd focus on how many people are allowed to connect to Apache at one time, and possible on how much RAM each PHP process is taking.

If there are any PHP modules installed that you don't need, you may want to consider removing or disabling them. Each installed module takes up RAM during every Apache request.

As far as the Apache config goes -- it looks like they may have moved the files around where those configuration items are stored.

What is the output of this command:

ls /etc/apache2/mods-enabled | grep mpm

If it's just one file, could you also paste in the contents of that file?

Hi Eric,

Various websites on the server send email notifications and submissions but it doesn't receive email so I'm not sure if mailman is what is used for this purpose. If it isn't using a lot RAM I might leave that. But then again looking at the list of Features and Plugins, it isn't even enabled so maybe I can get rid of it.

Output from "ls /etc/apache2/mods-enabled | grep mpm"

> ls /etc/apache2/mods-enabled | grep mpm
mpm_prefork.conf
mpm_prefork.load

You would know if you were using Mailman... it's not used on most systems (we're actually beginning the process of disabling this by default).

The Apache config file I'm interested in seeing is "mpm_prefork.conf" -- can you paste in the contents of that file?

mpm_prefork.conf

# prefork MPM
# StartServers: number of server processes to start
# MinSpareServers: minimum number of server processes which are kept spare
# MaxSpareServers: maximum number of server processes which are kept spare
# MaxRequestWorkers: maximum number of server processes allowed to start
# MaxConnectionsPerChild: maximum number of requests a server process serves

<IfModule mpm_prefork_module>
StartServers 5
MinSpareServers   5
MaxSpareServers 10
MaxRequestWorkers   150
MaxConnectionsPerChild   0
</IfModule>

# vim: syntax=apache ts=4 sw=4 sts=4 sr noet

Try setting the "MaxRequestWorkers" parameter to "40", and then restart Apache, and see if that helps.

Done. Real memory: 4.85 GB total / 3.53 GB free / 1.99 GB cached Swap space: 2.07 GB total / 1.81 GB free. However I don't believe it has impacted the system as it was around the same before I restarted Apache. The file I modified was in the mods-available directory. Does that mean it's active or inactive if it's in that directory?

That looks excellent. Yeah, your memory usage isn't too bad, when the system is recently rebooted or the services restarted. It seems to be running into periods of high resource usage though... so the goal of what we're doing is to make sure it can withstand those periods of time (and also, prevent over-usage).

Let's keep an eye on that and see how things go for a few days. Let us know what you see!

Good idea and many thanks for your help. Will keep an eye on things. At the moment we are running at Real memory: 4.85 GB total / 3.15 GB free / 2.95 GB cached Swap space: 2.07 GB total / 1.80 GB free, which I think is slightly better than an hour ago.

After a few days, we can see how things look. At that point, if you aren't experiencing problems, we can review how much RAM is being used, and look into increasing the RAM MySQL is using, and/or increasing the max connections Apache is allowed to use.

Have also just stopped mailman with no apparent problems (forms on websites seem to still send through ok). So we are now cruising at - Real memory: 4.85 GB total / 3.01 GB free / 2.81 GB cached Swap space: 2.07 GB total / 1.88 GB free. I know it isn't a couple of days yet but hey this is looking promising. If swap remains low for the whole day I'll be stoked.

Happy to report that swap is less. It's not 0% but runs from around 10% to 50% but hasn't maxed out for a while (currently 44%). So this has helped but hasn't completely eliminated swap usage. MySQL is running at 2.20GB

mysql 34326 0.9 2.6 2310636 135768 ? Sl Apr27 69:55 _ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-error=/var/log/mysql/error.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306

The only errors I have had was a notification that services had shutdown and restarted after a few minutes. This was on Sunday morning and was regarding 10 sites. I couldn't find any corresponding errors in the Apache logs so maybe it was just an aberration with the service monitor.

Hmm, Sunday morning is the log rotation, it's possible that it was noticing that Apache was restarting a few times during the early hours.

What does "free -m" show now?

And can you share your "ps auxwf" output again? Thanks!

> free -m
             total       used       free     shared    buffers     cached
Mem:          4967       4710        256       2666         16       2784
-/+ buffers/cache:       1909       3057
Swap:         2119        990       1129

Hmm, and if you restart MySQL with "service mysql restart", what are the output of these two commands:

ps auxw | grep mysql
free -m

MySQL is using a bit more RAM than I'd expect on your system, especially since we've given it a config file designed to run in a smaller footprint.

But, it does indeed seem a bit better.

I'd also be curious to see the output of these commands:

free -m
service apache2 restart
free -m

That will be a view of how much RAM your server is using before and after Apache is restarted. If that makes a big difference, another option will be to switch away from using a PHP Execution Mode that stores processes in memory.

ps auxw | grep mysql output

root     31542  0.0  0.0   4336   756 ?        S    08:17   0:00 sh -c (ps auxw | grep mysql) 2>&1
root     31543  0.0  0.0   4336   104 ?        S    08:17   0:00 sh -c (ps auxw | grep mysql) 2>&1
root     31545  0.0  0.0  11128   976 ?        S    08:17   0:00 grep mysql
root     33919  0.0  0.0   4336   288 ?        S    Apr27   0:00 /bin/sh /usr/bin/mysqld_safe
mysql    34326  0.9  2.3 2310636 119252 ?      Sl   Apr27  81:34 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-error=/var/log/mysql/error.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306

free -m output

             total       used       free     shared    buffers     cached
Mem:          4967       4457        509       2348        102       2818
-/+ buffers/cache:       1536       3431
Swap:         2119        570       1549

Will do the other shortly.

Could you the above info again, but restart the MySQL service first?

You can restart it with the command "service mysql restart".

Have restarted apache service. Free -m earlier the same but below after restart

             total       used       free     shared    buffers     cached
Mem:          4967       1392       3574        167        110        685
-/+ buffers/cache:        596       4370
Swap:         2119        360       1759

Will do the MySQL restart later.

> ps auxw | grep mysql
root     30912  0.0  0.0   4336   768 ?        S    16:40   0:00 sh -c (ps auxw | grep mysql) 2>&1
root     30913  0.0  0.0   4336   104 ?        S    16:40   0:00 sh -c (ps auxw | grep mysql) 2>&1
root     30915  0.0  0.0  11128  1028 ?        S    16:40   0:00 grep mysql
root     33919  0.0  0.0   4336    44 ?        S    Apr27   0:00 /bin/sh /usr/bin/mysqld_safe
mysql    34326  0.9  2.6 2310636 132528 ?      Sl   Apr27  86:43 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-error=/var/log/mysql/error.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306

Restarted service

> ps auxw | grep mysql
root     31355  0.1  0.0   4336  1612 ?        S    16:41   0:00 /bin/sh /usr/bin/mysqld_safe
mysql    31762 11.3  1.9 599764 99200 ?        Sl   16:41   0:01 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-error=/var/log/mysql/error.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306
root     31818  0.0  0.0  21704  2484 ?        S    16:41   0:00 /bin/bash /etc/mysql/debian-start
root     31843  0.0  0.0   6248  1788 ?        S    16:41   0:00 xargs -i /usr/bin/mysql --defaults-file=/etc/mysql/debian.cnf --skip-column-names --silent --batch --force -e {}
root     31857  0.0  0.0 104412  4568 ?        S    16:41   0:00 /usr/bin/mysql --defaults-file=/etc/mysql/debian.cnf --skip-column-names --silent --batch --force -e select count(*) into @discard from `information_schema`.`PARTITIONS`
root     31869  0.0  0.0   4336   796 ?        S    16:41   0:00 sh -c (ps auxw | grep mysql) 2>&1
root     31870  0.0  0.0   4336   104 ?        S    16:41   0:00 sh -c (ps auxw | grep mysql) 2>&1
root     31872  0.0  0.0  11132  1028 ?        S    16:41   0:00 grep mysql

But forgot to do the free -m beforehand. Output after is below.

> free -m
             total       used       free     shared    buffers     cached
Mem:          4967       4795        172       2380         41       2783
-/+ buffers/cache:       1970       2996
Swap:         2119        918       1201

Hasn't made any difference to usage from what I can see.

Heads up. Swap is on the increase again.

> free -m
             total       used       free     shared    buffers     cached
Mem:          4967       4788        178       2491         19       2744
-/+ buffers/cache:       2023       2943
Swap:         2119       1300        819

Okay, try this:

free -m
# Restart Apache
free -m
# Restart MySQL
free -m

I have a sneaking suspicion one or both of the above two services are contributing to the issue you're seeing, the above should help us understand which.

Swap has dropped over night. Do you want me to run these tests now or later tonight when swap will probably grow?

Only run the commands above if/when there is a problem.

It's doing it again.

> free -m
             total       used       free     shared    buffers     cached
Mem:          4967       4212        754       2074         63       2401
-/+ buffers/cache:       1747       3219
Swap:         2119       1431        688

Restart Apache

> free -m
             total       used       free     shared    buffers     cached
Mem:          4967       1024       3942         75         68        434
-/+ buffers/cache:        521       4446
Swap:         2119        297       1822

Restart MySQL

> free -m
             total       used       free     shared    buffers     cached
Mem:          4967       3242       1724       1178         84       1996
-/+ buffers/cache:       1162       3804
Swap:         2119        209       1910

Okay, according to that, it's looking like an Apache related issue. Apache and the processes it's spawning are using a large amount of RAM.

There's two things we can do to help --

One, we can reduce how many processes it's allowed to spawn, as well as have it restart individual workers more often to ensure they aren't growing too large.

To do that, try setting the following in the mpm_prefork.conf file:

StartServers 5
MinSpareServers   5
MaxSpareServers 10
MaxRequestWorkers   30
MaxConnectionsPerChild   150

The other thing we can't help as much with... but it would be to see if you can reduce the PHP modules and Apache modules that are being loaded.

Only you can do that though, as we don't know which ones are being used on your server.

You can see what Apache modules are being loaded in /etc/apache2/mods-enabled.

And the PHP modules are loaded in /etc/php5/conf.d.

If you find some you think you don't need, try temporary removing them, restart Apache, and then ensure that your websites still work properly.

Currently in mpm_prefork.conf we have

<IfModule mpm_prefork_module>
StartServers 5
MinSpareServers   5
MaxSpareServers 10
MaxRequestWorkers   40
MaxConnectionsPerChild   0
</IfModule>

Yup, understood! Replace those values with the ones I shared. That should help the issue you're experiencing.

Have made those changes and restarted Apache. Will wait and see. Thank you.

Received a low memory alert this morning at 4:03am. At 4:10am received another email stating free memory was ok again. When I saw these emails I tried to login to webmin only to be greeted by the "site not found" error message. The websites running on the server were ok, just webmin. I logged in via SSH and restarted webmin and all is good. Question, would low memory cause webmin to fail?

Hmm, what output do you receive if you run the command "dmesg | tail -50"?

I get this

> dmesg | tail -50
[3173814.617639] [19207]    33 19207    93264      674     135     1805             0 apache2
[3173814.617640] [19208]  1056 19208    84108     3574     117       43             0 php5-cgi
[3173814.617642] [19211]    33 19211    93211      604     134     1810             0 apache2
[3173814.617643] [19215]  1056 19215    84119     3583     117       45             0 php5-cgi
[3173814.617645] [19220]  1056 19220    84119     3568     114       60             0 php5-cgi
[3173814.617646] [19223]  1056 19223    84108     3572     117       44             0 php5-cgi
[3173814.617648] [19224]  1056 19224    84108     3609     116        8             0 php5-cgi
[3173814.617649] [19226]  1056 19226    84108     3536     117       80             0 php5-cgi
[3173814.617650] [19227]  1056 19227    84108     3608     118        8             0 php5-cgi
[3173814.617652] [19233]  1056 19233    84108     3360     116      256             0 php5-cgi
[3173814.617653] [19236]    33 19236    93202      596     134     1805             0 apache2
[3173814.617655] [19241]  1056 19241    83112     2322     108      209             0 php5-cgi
[3173814.617656] [19245]  1056 19245    82917     2263     109      104             0 php5-cgi
[3173814.617658] [19248]  1056 19248    82917     1870     110      499             0 php5-cgi
[3173814.617659] [19253]  1056 19253    82903     2200     102      189             0 php5-cgi
[3173814.617660] [19255]  1056 19255    82984     2195     105      203             0 php5-cgi
[3173814.617662] [19258]  1056 19258    82984     2202     104      176             0 php5-cgi
[3173814.617663] [19262]  1056 19262    82984     2299     104       80             0 php5-cgi
[3173814.617665] [19263]  1056 19263    84108     3419     115      197             0 php5-cgi
[3173814.617666] [19266]  1056 19266    85068    10147     119       50             0 php5-cgi
[3173814.617667] [19273]  1056 19273    82984     1976     103      402             0 php5-cgi
[3173814.617669] [19282]  1056 19282    82550     1797     103      240             0 php5-cgi
[3173814.617670] [19304]  1056 19304    81897     1363      94       16             0 php5-cgi
[3173814.617672] [19317]   108 19317   218120    14155      76        9             0 mysqld
[3173814.617673] [19345]     0 19345    10589       82      26       25             0 cron
[3173814.617675] [19347]     0 19347     1084       22       7        0             0 sh
[3173814.617676] [19348]     0 19348    56505    11620      81       25             0 monitor.pl
[3173814.617678] [19401]     0 19401    38275     5906      79    14354             0 /usr/share/webm
[3173814.617679] [19441]     0 19441    42709    11830      86    12943             0 /usr/share/webm
[3173814.617681] [19483]    33 19483    93193      581     134     1807             0 apache2
[3173814.617682] [19489]    33 19489    93193      581     134     1807             0 apache2
[3173814.617684] [19490]    33 19490    93193      581     134     1807             0 apache2
[3173814.617685] [19491]    33 19491    93193      581     134     1807             0 apache2
[3173814.617687] [19492]    33 19492    93175      881     132     1834             0 apache2
[3173814.617688] [19493]    33 19493    93175      884     132     1834             0 apache2
[3173814.617689] [19494]    33 19494    93175      881     132     1834             0 apache2
[3173814.617691] [19495]    33 19495    93175      884     132     1834             0 apache2
[3173814.617692] [19497]    33 19497    93175      985     132     1834             0 apache2
[3173814.617693] [19498]    33 19498    93175      983     132     1834             0 apache2
[3173814.617695] [19499]    33 19499    93175      985     132     1834             0 apache2
[3173814.617696] [19500]    33 19500    93175      985     132     1834             0 apache2
[3173814.617697] [19501]    33 19501    93175      985     132     1834             0 apache2
[3173814.617699] [19502]    33 19502    93175      983     132     1834             0 apache2
[3173814.617700] [19503]    33 19503    93175      985     132     1834             0 apache2
[3173814.617701] [19504]    33 19504    93175      985     132     1834             0 apache2
[3173814.617703] Out of memory: Kill process 19441 (/usr/share/webm) score 13 or sacrifice child
[3173814.617737] Killed process 19441 (/usr/share/webm) total-vm:170836kB, anon-rss:47088kB, file-rss:232kB
[3176455.767563] traps: php5-cgi[12435] general protection ip:70eb29 sp:7ffdaadd3920 error:0 in php5-cgi[400000+7e2000]
[3183658.234774] traps: php5-cgi[29145] general protection ip:70eb29 sp:7ffd443fd170 error:0 in php5-cgi[400000+7e2000]
[3201785.253753] traps: php5-cgi[41364] general protection ip:70eb29 sp:7fff01e74260 error:0 in php5-cgi[400000+7e2000]

Yeah it is looking like you're still ending up in low memory situations. In addition to removing unneeded Apache and PHP modules, we could further tweak "MaxRequestWorkers". Is that currently set to 30?

And how many domains do you have on your server? I'm tempted to suggest switching your domains from FCGID to CGI, which will make it so that there aren't any processes being stored in memory, which may be contributing to the issue.

MaxRequestWorkers is currently set to 40 as per your instructions from last week. Can drop it lower if you think that will help. Also not too sure as to which Apache and PHP modules aren't needed. If I can generate a list can you recommend ones to disable? Also will changing websites from FCGID to CGI impact their speed?

Thanks again.

Well, let's start here -- can you double-check that you've implemented the settings mentioned in Comment #80 above?

That should have been 30 by the way. Current mpm_prefork.conf is :

<IfModule mpm_prefork_module>
StartServers 5
MinSpareServers   5
MaxSpareServers 10
MaxRequestWorkers   30
MaxConnectionsPerChild   150
</IfModule>

Also forgot to mention that we are currently running 69 virtual servers on the Debian box. Was thinking of swapping around and moving everything to the Ubuntu server, then the 33 currently on it to the debian server and swapping over the pro licence to ubuntu. Would this work do you think?

Did the webmin update yesterday and had to restart webmin as it crashed. Noticed this morning that things were nice and stable. Real memory was getting up to around 45% and swap was lingering at 19%. However in the space of an hour, swap has increased to 56% and real memory has dropped to 37%. Must be something that is moving stuff from real memory to swap. Is there a way to increase swap on a running system? Maybe that will solve the problem.

You're seeing something fairly odd going on there... I'm not sure what it is. I haven't seen a system with 6GB of real memory, and 2GB of swap, constantly struggle to not run out of RAM.

Yes, you could add additional swap, but that could create some serious performance issues on your system there.

You also asked if Ubuntu would make a different... I'm unfortunately not sure yet (though, since that does come with a different kernel, I am somewhat interested in seeing if it makes a difference... but moving to a new distro is a bit step).

Personally, I'd keep digging into the processes running on your server, and work on ways to reduce memory usage.

For starters, just to see if it helps, you could try changing "MaxRequestWorkers" to "20" rather than "30". And make sure that Apache is restarted afterwards.

This is somewhat extreme, but you could always setup a cron job to restart the Apache and MySQL processes once a day, which should keep memory usage down as well.

I could just migrate virtual servers one at a time and see how it goes. Once the load is taken off the Debian server I might be able to balance things a little better. Tomorrow is patch Tuesday here in Oz (even though it's Wednesday). Might reboot the debian server at that time or Thursday morning to see if that settles things down a bit. Uptime is currently 40 days. Could also shut it off and assign more real memory to it.

Ok so we are there again. I had thought that for the last two weeks that we were on top of memory usage. However today I received notification from the server that memory was down. About seven minutes later it notified me that all was good. Memory at the moment is running at

total used free shared buffers cached Mem: 4967 4723 243 2416 32 2608 -/+ buffers/cache: 2083 2884 Swap: 2119 1560 559

So any idea as to what I can trim to see if that makes a difference. Must be something to do with Apache as restarting that always drops usage dramatically. As I stated earlier, I could migrate sites to the Ubuntu server. It's not being stressed. So the question is if I migrate just one virtual server at a time, is that found in "Virtualmin virtual_sever_name>server configuration>transfer virtual server"?

Thanks.

If the problem gets better when restarting Apache, that means it could be related to either Apache or PHP.

If you didn't already, my suggestion would be to review all of your Apache and PHP modules that you're using, and disable ones you aren't using.

I suppose an option that could help prevent this from occurring is to try restarting Apache each night from within cron. Maybe there is a misbehaving module that is using up too much memory, restarting Apache could help keep that under control.

I restarted Apache last night manually and yes that settled things down a bit. A quick comparison with Ubuntu Apache modules and Debian Apache Modules shows the same ones enable and disabled. Not sure if there is anything there I can prune.

I'll ask again. If I want migrate a virtual server to the Ubuntu box from the Debian box, do I do that through "Virtualmin>virtual_sever_name>server configuration>transfer virtual server"? I'm thinking of moving some non critical VS's to Ubuntu (it needs rebooting more often than Debian) and share the load a bit that way. I must be reaching some sort of limit on Debian with the way swap is behaving. Be also interesting to see if Ubuntu starts behaving the same if I put more load on it. If it doesn't then perhaps I'll transfer the pro license to it and use the Debian box for a few critical sites.

Sure, we'd definitely suggest using the distro you feel most comfortable with. I personally have had excellent luck on Ubuntu, and haven't seen the issues you're describing. But it could be something with the particular hardware combination or other unusual issue.

To migrate to a new server, these instructions here can assist with that:

https://www.virtualmin.com/documentation/system/migrate

Oh, and you also may want to compare PHP modules, which you can do by looking in /etc/php5/conf.d/.

Thanks for that. I don't want to migrate all domains, just one at a time though. There's a feature under Virtualmin>virtual_server>Server Configuration>Transfer Virtual Server. Will this do a transfer to the other box of that domain? That way I can move them one at a time and see how thing travel as I do so. Call it a work in progress.

Will also look into the PHP modules and do a comparison. Report back shortly.

Ah right, you did mention that earlier.

You could either modify the instructions for moving one domain at a time, or you could use the GUI option you mentioned, either would work fine.

Mmmm can't find /etc/php5/conf.d/. The directory php5 only contains apache2, cgi, cli and mods-available directories, no files. Same on both servers.

Okay, how about this -- what is the output of this command:

dpkg -l 'php5-*'

Ubuntu

> dpkg -l 'php5-*' Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-===================================-================================-============-=================================================================================================== ii php5-cgi 5.5.9+dfsg-1ubuntu4.17 amd64 server-side, HTML-embedded scripting language (CGI binary) ii php5-cli 5.5.9+dfsg-1ubuntu4.17 amd64 command-line interpreter for the php5 scripting language ii php5-common 5.5.9+dfsg-1ubuntu4.17 amd64 Common files for packages built from the php5 source ii php5-curl 5.5.9+dfsg-1ubuntu4.17 amd64 CURL module for php5 un php5-dev <none> <none> (no description available) un php5-fpm <none> <none> (no description available) ii php5-gd 5.5.9+dfsg-1ubuntu4.17 amd64 GD module for php5 ii php5-json 1.3.2-2build1 amd64 JSON module for php5 ii php5-mcrypt 5.4.6-0ubuntu5 amd64 MCrypt module for php5 un php5-mhash <none> <none> (no description available) ii php5-mysql 5.5.9+dfsg-1ubuntu4.17 amd64 MySQL module for php5 un php5-mysqli <none> <none> (no description available) un php5-mysqlnd <none> <none> (no description available) ii php5-readline 5.5.9+dfsg-1ubuntu4.17 amd64 Readline module for php5 un php5-suhosin <none> <none> (no description available) un php5-user-cache <none> <none> (no description available) un php5-xcache <none> <none> (no description available) un php5-xdebug <none> <none> (no description available)

Debian

> dpkg -l 'php5-*' Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-===============-====================-============-========================================================== ii php5-cgi 5.6.20+dfsg-0+deb8u1 amd64 server-side, HTML-embedded scripting language (CGI binary) ii php5-cli 5.6.20+dfsg-0+deb8u1 amd64 command-line interpreter for the php5 scripting language ii php5-common 5.6.20+dfsg-0+deb8u1 amd64 Common files for packages built from the php5 source ii php5-curl 5.6.20+dfsg-0+deb8u1 amd64 CURL module for php5 un php5-dev <none> <none> (no description available) un php5-fpm <none> <none> (no description available) ii php5-gd 5.6.20+dfsg-0+deb8u1 amd64 GD module for php5 ii php5-json 1.3.6-1 amd64 JSON module for php5 un php5-mhash <none> <none> (no description available) ii php5-mysql 5.6.20+dfsg-0+deb8u1 amd64 MySQL module for php5 un php5-mysqli <none> <none> (no description available) un php5-mysqlnd <none> <none> (no description available) ii php5-readline 5.6.20+dfsg-0+deb8u1 amd64 Readline module for php5 un php5-suhosin <none> <none> (no description available) un php5-user-cache <none> <none> (no description available) un php5-xcache <none> <none> (no description available) un php5-xdebug <none> <none> (no description available)

Yeah that all does seem fairly normal.

Sorry I'm not sure what else the problem might be... you could certainly try migrating some domains to another server to see if that makes a difference.

Tried to transfer one virtual server over to ubuntu but it fails with this error : "Failed to transfer server : Failed to contact Virtualmin on destination system"

Is there anything that would be preventing port 10000 - 10010 from being accessible from your Ubuntu server?

Not that I'm aware of. Can access virtualmin from a browser on that port number. Just did a Google and am looking at swappinness. It's set to 60 on both systems. What if I reduced the amount on the debian box to 40 to see if that has an impact on swap size?

RAM or swap should not be the cause of the problem you're seeing... there error is indicating that Webmin on your server is having difficulty accessing Webmin on the remote server.

A way around that would just be to create a backup file of your domain, copy that backup file to your other server, and then restore it on the other server. That's essentially what the transfer option you're attempting to use does.

Just used telnet to port 10000 and it connected. So that's not the issue. Could it be that root access is needed yet by default root is denied to external connections?

Root logins shouldn't be denied by default... but if root isn't able to log in from your Ubuntu server, that could cause the problem you're experiencing.

I think that root is denied from external logins. I can't SSH in using the root account, I have login using another account with sudo access on the Ubuntu box. I can then change to root once logged in (which I rarely do). So can I backup and restore using the LAN IP addresses of both boxes rather than their external facing IP addresses?

You would need to temporarily set a password for root, if you wish to use the Transfer Virtual Server feature in the GUI.

The other option is to just generate a Virtualmin backup of that domain, copy it, and then restore it on the other server.

As far as what IP address you can use to access the remote system -- I unfortunately don't know the answer to that, it would depend on your network architecture. However, you could certainly try using the internal LAN IP to see if that works, I don't believe there is any kind of need to use an external IP.

Yep have managed to get root access remotely working. Can log in from the shop but the transfer still fails. The issue I can see from the logs is that the ip address of the debian box resolves to a different hostname rather than the one it should. Using the DNS on the windows box appears to be the problem in this instance. If I can get it not to do a DNS lookup for that ip address I reckon it would work. Is there a way to whitelist an ip address in SSHD that you know of?

You really might want to consider just making a backup and restoring that on the other server, that's likely to be a much simpler process.

You shouldn't need to whitelist any IP addresses in SSH though. However, you could always try SSH'ing to the Debian server from the Ubuntu one just to ensure that works.

Nailed it. Working now. Transfer seems to be more elegant as it backs up and restores with one click on the new system with out user intervention. Like it. Will now proceed to move a few sites over and see how things pan out. Will keep you posted.

Not sure what effect this will have but I have also set swappiness = 30 instead of the default 60. Different sites give different takes on this option, some say to leave it as is, while others say that systems with sufficient RAM, it may be beneficial to the overall performance of the system. After changing it swap usage has dropped with a corresponding increase in RAM usage. However it's not a dramatic climb nor decrease, swap dropped from 75% to 63% and RAM has increased from 38% to 45%. Hopefully there will be no implications from doing this. Tomorrow I'm intending to move 14 sites off and running them on the Ubuntu server. May have to increase RAM on the Debian server too, perhaps another 5GB which will bring it to 10 in total. Have to shut the VM down to do this so am a little loathe do it.

Yeah there really seems to be a lot of PHP processes hanging around from the FCGID caching.

You really might want to try setting some domains to use CGI rather than FCGID. Yes, FCGID is generally a bit faster due to that caching, but it also uses more memory :-)

I personally set most of my servers to use CGI.

That's an option you can change in Server Configuration -> Website Options.

Good spot. Am in the process of changing them all. As all sites are Wordpress sites, I can add the WP Super Cache plugin on all of them to speed things up if need be (although this can present problems with clients making changes that don't show up). If this doesn't help swap I still have a list of servers to move :-)

Will report back when done after 24 hours. Thanks for your patience.

Brilliant. Memory is sitting at 23% and swap at 12%. Been pretty rock solid all the time. Trade off may be a bit higher CPU usage but apart from that no out of memory error messages at all. I actually switched three sites back to FCGI yesterday as they were ones that had a fair amount of traffic. Also haven't rebooted the server like I said I would, it's been up and running for 62 days.

Many thanks.

That's great to hear, thanks for letting us know how things are going!

All has been running sweetly however this morning it all happened again, server went into read only mode. This time I grabbed some screen shots of what was going on as I was trying to get it fixed. Thankfully it does respond well to self repair but just don't know what the root cause is. Those screen captures are now on the server and can be found at the link below. Seems it happened at around 23:00 last night as the next entry in the logs was after I restarted the server. The last log entry for last night is "Jun 13 23:10:30 debian postfix/anvil[59011]: statistics: max connection count 1 for (submission:168.103.85.18) at Jun 13 23:00:49". Not sure if this means anything though. The next entry is "Jun 14 11:40:51 debian rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="627" x-info="http://www.rsyslog.com"] start". The backup of the server started last night at 22:46 and ran for 13 minutes. The Ubuntu server was backed up after this and stated at 22:59. Can't see any logs that indicate what may have transpired.

https://www.rdweb.com.au/DebianOutput14-6-2016_1.png

For the rest of them just substitute the last digit with 2 through 7. Not sure if I got enough screen captures, if not will have to wait till next time.

That generally means that something is occurring on the disk to cause a problem.

I understand that this is a VPS, but could there be a bad sector on the host that's causing you some trouble? I think something along these lines is the most likely issue.

You're looking at some kind of very low level problem there though. Either there's a hard drive problem, or potentially a kernel problem or bug.

If there's some way you could run a disk scan on the host, that would be my recommendation as to where to start.

I've checked the drive for errors but the doesn't appear to be any. Event viewer doesn't show any hardware failures either. Usually any underlying hard drive issue will show up there.