HELP!! Apache crashing / hanging ?? what .. ??

31 posts / 0 new
Last post
#1 Wed, 01/06/2010 - 00:13
Daworm

HELP!! Apache crashing / hanging ?? what .. ??

I seem to be having an issue with my server. Apache seems to crash on me randomly but doesn't appear to be showing in the logs. All I see is a graceful restart request log around the times that apache dies, but nothing saying it didn't restart.

In googling I stumbled accross someone mentioning that their SSL key seemed to be butchered and then generated a new key for their admin tool (their were using ISPConfig or something).

I do know that I cannot access my server the that HTTPS: address, I get an error when attempting to do so and as a result I have disabled the https: access requirement to login to the admin panel.

How/where do I go within VirtualMin/Webmin to configure a new SSL Key to see if that will alleviate my issue?

EDIT: See FOLLOWUP POST!!!

Wed, 01/06/2010 - 09:05
andreychek

Howdy,

Yeah, trying to figure out what's wrong when there aren't any error messages can be a toughy!

I've seen times when Apache -- upon being told to restart -- doesn't fully exit. For whatever reason, one of it's processes gets hung up in memory. From there, as Apache attempts to restart, it can't, since the old processes that wouldn't die are still bound to ports 80 and/or 443.

I don't have a particularly good fix for this, other than trying to kill any stray apache2/httpd processes after telling Apache to to "stop" or "graceful-stop".

As far as SSL goes -- if you want to try that route, for any Virtual Server that has SSL enabled, you can select Server Configuration -> Manage SSL Certificate.

From there, go into "Signing Request", and you'll see a button on the bottom labeled "Generate new Self-signed key".

-Eric

Fri, 01/15/2010 - 01:15
Daworm

Ok - keeping my original thread. Still not sorted this. I have only seen it happen once when I was online at the time and since then another person I know has seen it several times.

The server seems to crash but I can SSH into it without an issue (at least I could when I noticed it happen). Trying to get more information from my friend.

Additionally this is the most recent log

[Sun Jan 10 04:02:12 2010] [notice] Digest: generating secret for digest authentication ...
[Sun Jan 10 04:02:12 2010] [notice] Digest: done
[Sun Jan 10 04:02:12 2010] [notice] mod_python: Creating 4 session mutexes based on 256 max processes and 0 max threads.
[Sun Jan 10 04:02:13 2010] [notice] Apache/2.2.3 (CentOS) configured -- resuming normal operations
[Sun Jan 10 04:02:13 2010] [notice] Graceful restart requested, doing restart
[Sun Jan 10 04:02:13 2010] [error] (9)Bad file descriptor: apr_socket_accept: (client socket)
[Sun Jan 10 04:02:13 2010] [error] (9)Bad file descriptor: apr_socket_accept: (client socket)
[Sun Jan 10 04:02:16 2010] [notice] Digest: generating secret for digest authentication ...
[Sun Jan 10 04:02:16 2010] [notice] Digest: done
[Sun Jan 10 04:02:16 2010] [notice] mod_python: Creating 4 session mutexes based on 256 max processes and 0 max threads.
[Sun Jan 10 04:02:16 2010] [notice] Apache/2.2.3 (CentOS) configured -- resuming normal operations

Please note though the fault happened earlier today sometime, not 5 days ago so this doesn't appear to match up... anything else anyone can suggest?

Just been advised that trying to access the e-mail was not working either, via HTTP or email client (POP3) so I'm wondering if there's a DNS issue?

I've setup a ping test now to check every 15 minutes for the next time it goes down.

Fri, 01/15/2010 - 09:33
andreychek

Howdy,

Well, you might consider setting up a monitoring system that watches the individual services on your server. If DNS were to stop functioning, for example, a monitoring system should be able to alert you to that fact.

Virtualmin has status monitoring built in, you could always use that. Personally, I like "monit" -- and a lot of other folks use Nagios -- those two are good as well.

Another thing about tools like Monit and Nagios -- they can watch that a particular website is working. If it doesn't, you can configure it to restart Apache (and then send you an alert that Apache was restarted).

However, the thing with monitoring -- you'll know exactly what is, and isn't, working at any given moment... and if Apache goes down, you'll know whether it's just Apache having trouble, or whether other services are too.

-Eric

Fri, 01/15/2010 - 19:37
Daworm

Righto - I'll get that going now.

Fri, 01/15/2010 - 21:14
Daworm

Munin AND Monit both configured...

Apache and Mysql are definately crashing. When I got it all going shows uptime of 2 hours and 9 minutes for apache and 2 hours and 16min for mysql - proftpd and postfix are running at 16d and 36d uptime.

Now to find out WHY it crashes I guess.... Monitoring service will kick it back into gear within 1 minute anyway

Fri, 01/15/2010 - 23:19
andreychek

Yeah, that's certainly odd!

Just to verify, you have Monit running on the same server as Apache and MySQL are on?

How much RAM does your server have -- can you paste in the output of "free"?

Also, do you see any "strange" kernel errors in the dmesg output?

-Eric

Sat, 01/16/2010 - 02:05
Daworm

Yes, running Monit on same server as Apache and MySQL.

Apache has restarted between original post and now... MySQL is still powering on without a restart as yet.

             total       used       free     shared    buffers     cached
Mem:       2066968    1717336     349632          0     286896    1043504
-/+ buffers/cache:     386936    1680032
Swap:      4200988          0    4200988

2GB ram in total.

Checking dmesg - nothing jumps out at me...

Copy of last error entries in error_log

[Sat Jan 16 14:18:53 2010] [notice] caught SIGTERM, shutting down
[Sat Jan 16 14:18:54 2010] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Sat Jan 16 14:18:54 2010] [notice] Digest: generating secret for digest authentication ...
[Sat Jan 16 14:18:54 2010] [notice] Digest: done
[Sat Jan 16 14:18:55 2010] [notice] mod_python: Creating 4 session mutexes based on 256 max processes and 0 max threads.
[Sat Jan 16 14:18:55 2010] [notice] Apache/2.2.3 (CentOS) configured -- resuming normal operations
Sat, 01/16/2010 - 02:14
Daworm

Monit Service has given me the following email

Resource limit succeeded Service apache
 
       Date:        Sat, 16 Jan 2010 14:19:55 +1100
       Action:      alert
       Host:        server.daworm.net
       Description: 'apache' total mem amount check succeeded [current total mem amount=85132kB]
Your faithful employee,
monit
 
Resource limit succeeded Service apache
 
       Date:        Sat, 16 Jan 2010 14:19:55 +1100
       Action:      alert
       Host:        server.daworm.net
       Description: 'apache' total mem amount check succeeded [current total mem amount=85132kB]
 
 
Your faithful employee,
monit
Sat, 01/16/2010 - 07:36
sgrayban

turn on debug in apache instead of info for logging.

You will get a very detailed log output of what is going on but this will also create a very large log file.

Sun, 01/17/2010 - 16:05
Daworm

I'll do that now.

This time though. Apache crashed and Monit cannot find the "apache" service to restart it.... o.O

Manual restart of the httpd service and it can monitor the apace service again. Hurm...

Sun, 01/17/2010 - 16:18
andreychek

What exactly was the error you got?

I'd make sure that the start/stop scripts listed in the Monit config point to the correct init scripts.

-Eric

Sun, 01/17/2010 - 18:18
Daworm

It said it could not find service "apache" and the script is configured correctly.

Going to put it into debug mode now and see what I can reveal.

EDIT: To confirm loglevel of "debug" will capture ALL errors ? Until now it was just going for warns I think.

Sun, 01/17/2010 - 19:12
sgrayban

Read the apache docs about logging... seems you haven't even done that.

Sun, 01/17/2010 - 19:37 (Reply to #14)
Daworm

Read it and not exactly sure what I should be looking at, however based on what I can see so far, I've set the loglevel to crit.

Sun, 01/17/2010 - 20:00
Daworm

Looking at this more. I cannot seem to locate a httpd.pid file... wondering if that's the cause of some things.

If I do a manual stop/start of Apache - the httpd.pid file gets generated.. but after a crash (at this stage) it does not generate a httpd.pid in /var/run/

Yet Apache works... this might be the cause of my monit issues in that regards.

Mon, 01/18/2010 - 10:18
andreychek

Well, not finding the httpd.pid isn't likely the problem, though it could potentially be a symptom.

Regarding Monit not being able to find the Apache service -- could you post the part of your Monit config where the Apache tests are setup?

Also, when you have Apache up and running, could you run "ps auxw", and add the output as an attachment to this thread?

Thanks,

-Eric

Tue, 01/19/2010 - 17:58
Daworm

auxw attached as requested.

I have just altered my MaxClients settings to 19 and lowered starting server threads etc. as appropriate. From some reason (it might be normal I don't know) the mem use slowly creeps up and when a process hits around 50MB mem usage it spawns a new thread...

Thu, 01/21/2010 - 00:58
Daworm

Well - after adjusting MaxClients to 19, and starting 3 by default. I'm only up to 6 clients running at the moment and Apache (this time) is now up to 1Day 6Hours run time (this last week it's been struggling to get to 20Hours).

BUT! On the same note it has had uptimes of 6-7Days before it randomly dies on me... so time will tell.

On that note though: Can I safely re-install Apache from the VirtualMin Repo's without having to do anything extra on my server? I mean if I backup my httpd.conf and stop Apache, would a

httpd stop yum remove apache yum install apache httpd start

be sufficient without breaking the server terribad?

Thu, 01/21/2010 - 13:08
andreychek

Howdy,

Well, if your goal is to verify that nothing is corrupt with Apache -- you can run this to verify the contents of a given package:

rpm -Vv PACKAGE_NAME

That is, in your case, you could use:

rpm -Vv httpd

That will have RPM check each file that was included in the httpd installation. Doing that checks the file size, MD5 sum, permissions, type, owner and group of each file.

If that doesn't show anything unusual, you won't gain anything by reinstalling Apache :-)

-Eric

Thu, 01/21/2010 - 13:26
Daworm

Not able to make sense of the big list, did it without verbose and got this.

missing   c /etc/httpd/conf.d/welcome.conf
S.5....T  c /etc/httpd/conf/httpd.conf
Thu, 01/21/2010 - 13:33
andreychek

Howdy,

Well, you'll need to peek in the rpm manpage to get a list of what all that means :-)

In short, though, it's saying the only files different now from when the package was installed are those two config files.

It's saying the welcome.conf file isn't there (which is fine), and that the httpd.conf file has changed (which we know).

So long story short -- outside of those two config files, everything else relating to the httpd package is completely normal.

If you're concerned, you could expand your search to other related packages, such as PHP... you'd want to be on the lookout for anything it says about binary files, I wouldn't be concerned with config files.

-Eric

Thu, 01/21/2010 - 16:52
Daworm

php gives only this:

[root@server ~]# rpm -V php-cli
prelink: /usr/bin/php: at least one of file's dependencies has changed since prelinking
S.?.....    /usr/bin/php
prelink: /usr/bin/php-cgi: at least one of file's dependencies has changed since prelinking
S.?.....    /usr/bin/php-cgi

In other news, I have a funny feeling it might have been the maxclients was set [b]way[/b] too high. Tuning this back I have only seen a maximum of 6 child processes running (it can spawn up to 19 (well 12-15 depending on if it uses the min or max spare clients settings).

I have even seen the mem use drop after getting to around 20% back to around 17% and Apache seems to be behaving.

This is the script I borrowed to tell me the recommended MaxClients

#!/bin/bash
echo "This is intended as a guideline only!"
if [ -e /etc/debian_version ]; then
    APACHE="apache2"
elif [ -e /etc/redhat-release ]; then
    APACHE="httpd"
fi
RSS=`ps -aylC $APACHE |grep "$APACHE" |awk '{print $8'} |sort -n |tail -n 1`
RSS=`expr $RSS / 1024`
echo "Stopping $APACHE to calculate free memory"
/etc/init.d/$APACHE stop &> /dev/null
MEM=`free -m |head -n 2 |tail -n 1 |awk '{free=($4); print free}'`
echo "Starting $APACHE again"
/etc/init.d/$APACHE start &> /dev/null
echo "MaxClients should be around" `expr $MEM / $RSS`
Fri, 01/22/2010 - 10:51
andreychek

php gives only this:

Those may actually okay too, I believe both of those are just symlinks...

Tuning this back I have only seen a maximum of 6 child processes running

Well, while I still find all this a little odd, it's great that it's working for you now :-)

Usually, Apache wouldn't just silently crash during a low-memory situation; and while it's possible that the kernel's OOM-Killer could have killed Apache to prevent the server from dieing, that would have showed up in both the system logs as well as the dmesg output.

I don't recall if you're using Virtualmin Pro or GPL -- but if you're using Pro, you may want to take a peek at the system statistics during the time Apache has been crashing to see if your RAM was all used up.

Either way though, I'm glad you haven't seen this issue come up lately!

-Eric

Fri, 01/22/2010 - 21:29
Daworm

Best thing too, while I've been playing with all of this at work, 3 work mates have gone "Oh, are you doing hosting? I need some webspace..." :)

So if this goes nicely for the next week... YIPPIE!! I've just restored a forums with about 50-100 members on a given week posting and a core group of 10-12 people on daily. With over 55K posts... so I'll see how it behaves now.

Sat, 01/23/2010 - 22:02
Daworm

Nup - didn't work. Apache still crashing / restarting on me and no errors jump out and say why. :(

At least the PID isn't changing (monit tells me when it changes).

Sun, 01/24/2010 - 18:03
Daworm

Ok - going out of my mind here.

I restarted Apache as I had 1 website that wasn't responding to http requests (even after restarting) but it came good on it's own...

Then within 30 minutes, Apache had restarted again, on it's own... and Monit advertised the PID had changed. NO errors as to why, Monit didn't do it as it alerts me if it restarts it...

I'm tempted to scratch what I have and re-install and try again...

Sun, 01/24/2010 - 19:31
andreychek

Well, to try another avenue... you say this typically happens right after a graceful restart?

Apache should be able to come back online after such a restart. If you manually try a graceful restart, does that work? Or does Apache fail?

Also, as Scott mentioned earlier, you may want to tinker with increasing the Apache loglevel. Setting it to "crit" as you did would actually display less logs, showing just "critical" errors rather than all messages. You may want to experiment with "info" and "debug".

If you're discovering that Apache fails each time you do a graceful restart, that's no good :-) That could certainly be the issue then, which makes the key to figure out what's causing Apache to not start back up...

If you can reproduce the issue by performing graceful re-starts, perhaps increasing the log messages would explain why it's failing.

-Eric

Sun, 01/24/2010 - 20:02
Daworm

Yeah - I've set it to Debug and I'll let it go and see what happens. Thing I noticed is the /var/log/httpd/error_log doesn't get any errors in it, but the virtualmin error log seems to capture it all.

So I might download those and start trawling through it for what's going on (didn't realise the errors were being redirected to it).

Mon, 01/25/2010 - 18:03
Daworm

Trawled and nothing... Gotta wait for a proper crash I guess while debug is on. Just a bunch of errors from Wordpress due to duplicate IP entries in the database (spam bot registrations), there were some error from an old simple gallery script I had up. But I've since removed that just incase (some undeclared functions and what not).

I DID get the script about 2-3 years ago, nothing fancy. And no updates to it since I got it.

Time will only tell now.

Wed, 01/27/2010 - 16:31
Daworm

So far so good.

I have a funny feeling it was this script...

fingers crossed If I get more than 7days uptime then I KNOW it's resolved and that the problem was a fairly outdated script (and the author's e-mail no longer active). Pity, was a very simple gallery too.

Topic locked