Apache hangs every two weeks after logfile rotation [#25864]

Submitted by webinteractive on Fri, 03/22/2013 - 18:11

Hi,

On our system we now have the problem that every two weeks Apache completely dies.
We've been searching and searching and it appears that it happens after the logfile rotation.
At this moment we host about 173 sites and that means that apache gets 173 times gracefully restarted.

What is the best way to configure Virtualmin that it does not add these logfile rotation entries?
I want to add these lines to handle it for all sites:

/var/log/virtualmin/*_access_log /var/log/virtualmin/*_error_log { rotate 5 weekly compress postrotate /usr/sbin/apache2ctl graceful endscript }

So apache gets restarted once.

Best regards,

Joshua.

Status:

Active

Comments

Submitted by JamieCameron on Fri, 03/22/2013 - 19:45 Comment #1

We don't yet have a solution for minimizing the number of times a graceful reload is triggered, sorry.

However, this shouldn't cause Apache to grace - calling /usr/sbin/apache2ctl graceful is just supposed to indicate that a config reload is needed, not actually restart the running Apache process.

Submitted by CarlLindner on Tue, 03/26/2013 - 22:58 Pro Licensee Comment #2

I'm seeing something similar over the course of a two week period myself. Thanks for narrowing it down for me Joshua. I only have a few domains and it's been more of a nuisance, until today. I've noticed that for some of my domains in /etc/logrotate.conf, there are actually different settings being used depending on the domain. You have a lot more domains, are they all the same?

/usr/sbin/apachectl graceful
/etc/rc.d/init.d/httpd restart ; sleep 5
/etc/rc.d/init.d/httpd graceful ; sleep 5

Is there any rhyme or reason for these being different? Essentially when the service monitor goes off for httpd, I find that there is one httpd process that won't go away. Essentially I have to kill -9 it and then the regular service httpd start works just fine.

root@cserver1a ~]# ps -alef | grep httpd
5 R apache   18486     1  5  90  10 - 66006 -      Mar24 ?        02:43:42 /usr/sbin/httpd
0 S root     22222 22121  0  80   0 - 25811 pipe_w 09:30 pts/0    00:00:00 grep httpd
[root@cserver1a ~]# service httpd restart
Stopping httpd:                                            [FAILED]
Starting httpd: (98)Address already in use: make_sock: could not bind to address [::]:80
(98)Address already in use: make_sock: could not bind to address 0.0.0.0:80
no listening sockets available, shutting down
Unable to open logs
                                                           [FAILED]
[root@cserver1a ~]# kill -9 18486
[root@cserver1a ~]# service httpd start
Starting httpd:                                            [  OK  ]
[root@cserver1a ~]#

Submitted by JamieCameron on Tue, 03/26/2013 - 23:26 Comment #3

These may be different due to Virtualmin changing the command it uses over different versions.

You could try changing them all to /etc/rc.d/init.d/httpd graceful ; sleep 5

Submitted by CarlLindner on Tue, 03/26/2013 - 23:45 Pro Licensee Comment #4

Thanks Jamie, will give it a shot.

Submitted by jrhosting on Tue, 04/16/2013 - 07:36 Comment #5

We are seeing the same (daily). We are running our webservice in a jail (FreeBSD minimal virtualization) and execute a graceful prod via the jail management software. this didn't result in problems before, but now we are seeing that the service is out every night for like 10 minutes.

One thing that prevents this from happening every single domain is the following in /etc/logrotate.conf (/usr/local/etc/logrotate.conf for FreeBSD users):

postrotate
/usr/local/bin/ezjail-admin console -e "/usr/local/bin/apachectl graceful" webinstance
endscript
# default script
# see "man logrotate" for details weekly
# keep 4 weeks worth of backlogs
rotate 4

Together with:

#/usr/home/domain/logs/access_log /usr/home/domain/logs/error_log {
# rotate 5
# weekly
# compress
# sharedscripts
#}

This makes logrotate rotate the logfiles, and restart/graceful prod the instance one time after performing ALL logfiles (We have around the same amount of hosted domains).

Sadly: that doesn't make our problem go away. It also seems like apache isn't restarting but is no longer responsive, looks like a script around 0.00 (bw.pl?) is so heavy that the httpd processes are no longer responding or something. I am not 100% sure but it seems this behaviour started to give problems after the cron jobs were consolidated into webmin's supervision instead of the seperated supervision from virtualmin.

Thanks
Remko (JR-Hosting)

Submitted by JamieCameron on Tue, 04/16/2013 - 12:15 Comment #6

Using a shared script isn't always a good idea, as it only gets run after all the logs have been rotated.

This could cause some log entries to get lost, as Apache will continue to write to the old log file until it gets restarted.

Submitted by webinteractive on Tue, 04/16/2013 - 15:03 Comment #7

We are going to try to restart apache2 every week. This is not a solution, but hopefully remedies the problem of every two weeks a hang of the apache2 processes.

Submitted by jrhosting on Wed, 04/17/2013 - 02:16 Comment #8

Well, restarting apache after every logrotation isn't great either :-).

Recently we just stopped logging for every unique vhost and are now logging that centrally. We are in the process of writing a script that will split the logfiles (apache has a tool for that) and put that in the users homedirectories.

Using the native Apache tools for that we get automatic logrotation, and we can schedule the splitting of the logfiles to a time and place we prefer to do this. No mucking around with logrotate and * restarts per period :)

Submitted by jrhosting on Fri, 04/19/2013 - 02:48 Comment #9

Sadly I have to admit that while our situation is working fine for us in logfile management basis, that this does not resolve the daily " outage " yet.

I am still looking at what runs around 0.00 that causes apache to stop responding. Is there a good way to make sure that some scripts start already at 23.00 and some at 01.00 instead of combining them at 0.00? perhaps that makes a difference? (I will look into that during the day).

Submitted by andreychek on Fri, 04/19/2013 - 08:11 Comment #10

You can review the various crontab entries for root that start at midnight.

A quick way to do that is to run this command as root:

crontab -l

Any entry beginning with "@daily" will run at midnight, as will anything showing "0 0 * * *".

None of the system cron jobs should be running at midnight; distributions schedule those to run later in the morning. So I'd suggest starting with cron jobs owned by users such as root.

Submitted by jrhosting on Sun, 04/21/2013 - 02:08 Comment #11

Hi Andrey,

I updated all the crontabs last night to reshuffle everything. Even the jobs that run close to 0.00hrs from our system administration themselves (nothing actually at 0.00 though) are moved largely away.

Luckily it seems that your advice is perhaps "spot-on". Not entirely because of the jobs not running at 0.00hrs, but the ones around it are now reshuffled and last night we didn't have monitoring messages that we had an outage.

I will update the post for a couple of days to mention the outcome of the night jobs.

I have to admit though that I updated the webmin*.cron job that has bw.pl in it not to run every hour (also at 0.00) but every odd hour (1,3,5,7,9,11,13[etc]), it might be interesting whether we can schedule that more complexly.

Cheers Remko

Submitted by jrhosting on Mon, 04/22/2013 - 01:23 Comment #12

Again no hang up. The jobs that ran after 00.00 are most likely the cause of this which might not have been related to Virtualmin on our end.

A possible solution for the problem mentioned is either doing a restart at the end (graceful could also be done ofcourse, but sometimes that still restarts the apache process); or use the rotatelogs facility from Apache to auto rotate the files daily (can be done for any vhost ofcourse, or one single vhost and then splitted later).

Apologies for taking away the discussion in the wrong direction ..

Remko