Welcome, Guest
Please Login or Register.
Lost Password?
Re:server crashed and trying to figure out why (1 viewing)
Post Reply

TOPIC: Re:server crashed and trying to figure out why

#13800
DonaldPlummer (User)
Posts: 18
graphgraph
server crashed and trying to figure out why 2008/06/13 07:59  
I've got VirtualMin running on a CentOS 5 box. This morning, the machine locked up and we had to hard reboot it. I've checked /var/log/messages but I just see the machine working up until 12:39 am, and then nothing until we rebooted it.

We were able to see the VirtualMin usage graphs that show the memory and swap go up to 4GB a piece in the hour proceeding the freeze. It seems like a runaway script is a likely cause.

Any suggestions on how to find out what crashed or what was using all the processes?
  The administrator has disabled public write access.
#13858
andreychek (User)
Posts: 269
graphgraph
Re:server crashed and trying to figure out why 2008/06/15 06:56  
DonaldPlummer wrote:
I've got VirtualMin running on a CentOS 5 box. This morning, the machine locked up and we had to hard reboot it. I've checked /var/log/messages but I just see the machine working up until 12:39 am, and then nothing until we rebooted it.

We were able to see the VirtualMin usage graphs that show the memory and swap go up to 4GB a piece in the hour proceeding the freeze. It seems like a runaway script is a likely cause.

Any suggestions on how to find out what crashed or what was using all the processes?

Howdy,

Debugging crashes can be a tough one!

If you aren't seeing anything in the logs, I'm not sure of a way to get the data you're after. But it might be possible to set some things up for future reference.

I'm a bit fan of the tool "monit":

http://www.tildeslash.com/monit/

You can set it up to monitor various aspects of your box, and optionally have it react to certain circumstances.

For example, you can have it watch to make sure Apache is always running. And if not, start Apache back up.

But you can go a step further and have it restart Apache if Apache ever takes up more than, say, 75% of your available memory.

Similarly, you can just have it monitor your memory or CPU as a whole. If ever your system has less than some percent of memory available (let's say 10%), you can have it email you an alert, containing a process list (you'd have to use it's exec option, and tell it to run 'ps aux' whenever a low memory condition is met).

While this seems unlikely in your case, another cause of odd lockups is power and heat issues. Those are also hard to diagnose, and regularly running the "sensors" tool from the lm_sensors package can help alert you to issues with your power supply, fans, and such. I have that running hourly from cron, and it notifies me if the output of "sensors" contains the text "ALARM", which would happen if the fan RPM's were too low, or the power wasn't providing enough juice (or had too much!)

I hope that helps!
-Eric
  The administrator has disabled public write access.
#13889
DonaldPlummer (User)
Posts: 18
graphgraph
Re:server crashed and trying to figure out why 2008/06/16 13:47  
Thanks! We'll definitely take a look at monit.
  The administrator has disabled public write access.
Post Reply
get the latest posts directly to your desktop

Talk and Get Help

Support
Forums
Bugs and Issues

Get Virtualmin

OS Support
Buy Online
Download
Copyright 2005-2007 Virtualmin, Inc. All rights reserved.