Too many open log files - causing server instability

Problem: there are too many open file handles for the access_log and error_log files on the server. This is causing server instability and causing various server daemons to crash on a daily basis. I've had to write a cronjob that checks the daemons that are dying because of this and restart them as necessary. I am seeing about 30 minutes of accumulated downtime each day because of this problem.

I've counted: I have 358 log files on my Fedora 7 server:

# /bin/ls -1d /home/*/logs/*_log /home/*/domains/*/logs/*_log | wc -l 356 #

I then counted the number of open file handles:

# lsof | grep '/home/.*_log' | wc -l 5696 #

I then counted the numer of open file handles per log file:

# lsof | grep '/home/.*_log' | sed 's#.* /home#/home#' | sort | uniq -c 16 /home/account1/logs/access_log 16 /home/account1/logs/error_log 16 /home/account1/domains/subserver1/logs/access_log ...

Note that 16*356 = 5696.

I then checked how many httpd processes are running:

# ps ax | grep httpd | grep -v "grep httpd" | wc -l 16 #

So, it is pretty clear that the number of open files attributable to log files is equal to the number of httpd processes times the number of log files.

The problem is that this is not scalable and it is causing problems on my server. So, what I want to do is to have just two log files (access_log and error_log) into which all transactions go. Then, have these split out by a separate process to individual user log files. Or something like that. Something that does not involve having thousands of open files that is causing my system to be unstable.

This problem is not specific to Fedora 7. It is a general problem with scalability that should be solved by having only two open files for logging (access_log and error_log). The solution to this problem should allow software such as awstats and other log analysis software to continue to work properly.

So, the question is: how do I do this with Virtualmin? I am a paid subscriber.

Status: 
Active

Comments

Linux should allow processes to open tens of thousands of log files with no problems, so I am not sure why even having 5696 open would cause problems. Unless you are hitting the total or per-process file descriptor limits, both of which are tunable.

Could you tell us more about these crashes you are seeing? What error messages are you getting exactly?