Dedicated server locks up when attempting an install

10 posts / 0 new
Last post
#1 Tue, 02/25/2014 - 13:28
kevanharding

Dedicated server locks up when attempting an install

Attempts to successfully install virtualmin on a dedicated server result in the server freezing. I then have to visit my provider's control panel and reboot the server. Once I do that, I then chose to re-run the install.sh script and it swiftly goes to conclusion without error or lock up.

The server is a quad core machine with 4GB DDR3 memory and 2 x 500GB disks and my chosen OS is Debian 7 64 bit. I monitored the install process in another terminal window and the server load did peak at 1.87 for a second but settled down quite quickly. Memory usage climbed no higher than 1.2GB throughout - and was at that level when the machine locked up.

I have looked at the kern.log and dmesg files and apart from a reference to a missing hibernation file there is no obvious error that I can see (though I'm no expert with such log files). I have attached two virtualmin log files and both appear to suggest the install routine went the full distance. I could access the second log file straight away because the server did not lock up on me when I ran the install routine for the second time, but to get to the first virtualmin log file I had to reboot the server, as mentioned above.

With all that done, I could log in to my virtualmin interface and on a quick tour all seemed perfectly okay to me.

But, I'm afraid paranoia has got the better of me. The first run of install.sh seems to have completed quite successfully despite the machine then locking up. And it is that wretched lock up: it is not normal and has got me fretting - do I have some hardware or software issue waiting to bite me in the butt once I've take the time and trouble to load up all my domains, and set up my databases, and my mailboxes, make my DNS changes, etc.?

My service provider declines to help. It's an umanaged server and any software problems are down to me to resolve - which I know and accept, but that is based on the initial predicate that the hardware is okay. And it seems it's down to me to prove it's not if I suspect otherwise. Though unfair, that's what I am trying to do.

I have run some quick diagnostics (smartctl) on the hard disks but that reports no errors. Does any kind soul have any suggestions about any other steps I can take to pacify my paranoid concerns? Or, heh, it all seems to work, so should I bury my fears and plough ahead - in hope?

Tue, 02/25/2014 - 13:42
Locutus

Can you elaborate "locked up"? What exactly happened?

Normally the Virtualmin installer is very reliable on the grade A supported systems. I've never had any trouble with it on my Ubuntu 12.04 x64 installations.

Furthermore, even if the installer has some issue, no software install should be able to cause a server to crash/freeze, if that's what you meant. The installer script doesn't do anything unusual, it's mostly checking OS versions, using package manager to fetch stuff, edit config files and so on.

I don't know for your hoster, but mine offers a so-called "rescue mode" for dedicated servers which also includes an extensive hardware test procedure. You might ask yours if they have something like that too.

Tue, 02/25/2014 - 15:03 (Reply to #2)
kevanharding

Thank you for swift feedback. My thinking is in line with yours - it's highly unlikely to be the install.sh script. Besides, I created a virtual machine using VirualBox on my own desktop, gave it one processor, 1.5GB RAM, 30GB disk space, installed Debian 7 64 bit, then ran the Virtualmin installer. No surprises but it finished normally - no errors. Okay, it took a tad longer but then it was a much lower spec virtual machine than the dedicated "real" machine.

I'm currently trying to find out how the rescue mode facility my hoster provides. I have previously asked them about hardware tests but their answer was not too helpful. I'll plug away at that one.

Tue, 02/25/2014 - 14:18
andreychek

Along the lines of the rescue mode that Locutus mentioned -- do you perhaps have a way to run some sort of memory test on your server?

I'm not entirely certain how you'd do that remotely, though it's possible your provider offers a means to do that.

-Eric

Tue, 02/25/2014 - 15:12 (Reply to #4)
kevanharding

Thank you too Eric, for swift comments. As with you, I'm not sure how to run memory tests remotely - something I have asked my hoster but their comments tend to focus around the fact that all software problems are mine to resolve. Mmm. I'll try again.

I am also interested to note that, like me, both you and Locutus have focussed on the lock up and not followed up on my hinting that all appears well, should we just carry on regardless?

For Locutus, I forgot to explain my term "lock up". I had two ssh sessions open: one with the installer running and one with "top" running so I could monitor activity. Once the script arrived at the end (which it did according to the installer log file), neither ssh session responded and I had to kill both. I could then not ping the server, but once restarted and it was back on line, everything just seemed normal. Except - a lock up like that is NOT normal.

Tue, 02/25/2014 - 15:39
andreychek

Well, hardware problems can be funny things... since what you're describing is extremely unusual, and may very well be a hardware issue -- I'd be wary to move production sites onto a system that locks up under load.

Could you move forward with some additional testing? Sure! Perhaps you could setup a site on there, and run some benchmarking software on the site, in order to generate some load.

Unfortunately though, if you see lockups while running the install.sh script, I'd expect to see other lockups as well.

-Eric

Fri, 02/28/2014 - 14:18
kevanharding

I have taken time over the last few days to run further tests. The results have led me to revise my view about the hardware of my dedicated server - I no longer believe it to be at fault here. Beyond that, I'm a touch unsure what to conclude from my findings and would welcome other views.

The tests have centred around the fact that, as with most hosters, mine allows you to choose your os whenever you do a reinstall. My wish is to use Debian 7, but that was the one that has caused all the bother which in turn has led to this forum topic, and lengthy debates with my hoster during which I have said to them that I don't trust the hardware and will not make this my production box until the issue is resolved. So, I tried Debian 6, Centos 6 and Ubuntu 12.04, using 64 bit versions in all cases.

Debian 6 locked up the server just as Debian 7 does, but interestingly, both Centos and Ubuntu install normally, leaving a working server at the end of the installation process. In combination with other servers that I have already deployed in production and some other tests I have done, the overall results are:

  1. Hosting company's first VPS 1GB machine: Debian 6 installed fine a year ago and is still running.
  2. Hosting company's second VPS 4GB machine: Debian 6 installed fine a year ago and is still running.
  3. Hosting company's dedicated server 4GB quad core machine: Debian 6 & 7 both lock up the server, which goes off line and necessitates a reboot to get it back on line. After that, all the evidence is that Virtualmin appears to have been fully and correctly installed.
  4. Hosting company's same dedicated server: both Centos 6 and Ubuntu 12.04 install successfully, leaving the server in a full working state without lock ups and its attendant need for a reboot.
  5. My own server in my own office, AMD dual core with 4GB RAM: both Debian 6 & 7 install normally with no issues.
  6. A Virtualbox machine on my own desktop, and allocated 1 core CPU and 1.5GB RAM: Debian 6 & 7 install normally with no issues.

My hosting company has already declared that any software I use must be compatible with the any of the os's they supply, so I know where they'll point the finger of blame. On the other hand, I have already installed Debian 6 on two of their own VPS machines (albeit a year ago), and during this testing period installed Debian 6 & 7 on two of my own machines in my office; but I cannot install either Debian 6 or 7 on the new dedicated machine I've just bought from them.

I hope I've presented the results clearly enough for others to follow. And my questions are:

Can anything useful, or conclusive be deduced from these results?

Is there an incompatibility between Virtualmin's install script and my hosting company's Debian install images, and if so, how can I resolve that?

Even if no-one can give unequivocal answers to such questions, have I done enough to bury fears about the hardware, and can I proceed with my business plans to put this machine into production?

Fri, 02/28/2014 - 14:58
Locutus

Maybe what you perceive as "server locks up" is more like a networking issue? Like during the installation, for some reason, the server loses network connectivity or configuration?

Does your hoster offer something like a directly connected console? Mine has what they call a "LARA", a web browser operated device that connects to the keyboard and monitor ports of the server. I can request one of these whenever a server is unreachable via network.

I'd recommend trying that when situation occurs that makes you think the system "locked up".

Sat, 03/01/2014 - 08:46
BossHog

Howdy,
marksporr during the time frame that the server was unresponsive, was there any activity being written to any log files? Building on Locutus' assumption that maybe the server wasn't locked and you were disconnected. If the logs are being written during the "locked"(diconnected) I would install and enjoy V-min.

It's a bummer to here your provider is going right to the finger pointing out of the gate. Have they been fine to work with previously?
Joe

Sat, 03/01/2014 - 15:18 (Reply to #9)
kevanharding

Thank you for your comments. My hoster was maybe just a tad too willing to play the "this is an unmanagde service" card, but in fairness to them, they have subsequently offered me a new server. It might have been an academically interesting exercise to explore whether the current machine further, and work out if it was "just" the network connection that was dropping, and indeed whether log files continued to be written to, but the fact that it locks up is simply not normal behaviour however you look at it. The questions remain:

Is it the current hardware? Maybe, yet it works normally with Centos and Ubuntu.

Is it the Debian image my hoster uses? Maybe, yet it works on other servers of theirs I have deployed.

Is it the Virtualmin script? But the script works on all other servers I have tried.

This is all so much smoke and mirrors at the moment and it's getting beyond my diagnostic capabilities. I can't change the Debian image that my hoster provides to create the os. I can't change the Virtualmint install script (well, I could, but I wouldn't have a clue what to change). But I can change the hardware, so I'm going to accept my hoster's welcome offer. Let's see if that bring a resolution.

Topic locked