Per-domain logrotate brings down the server because of many graceful restarts

Hello All,

We have a Virtualmin setup with 400 websites(domains). When weekly logrotate initiates, the server freezes for half an hour. Heavy I/O and non responsive services.

Why this happens: The built-in logrotate module of Virtualmin creates one logrotate configuration file for every domain in /etc/logrotate.d/. This script tells the logrotate script to rotate the error and access log for that particular domain, then make a graceful restart of httpd wait 5 seconds, than continue. As result when logrotate initiates we have 400 graceful restarts of Apache, one every 5 seconds. This not only makes Apache non accessible but also create huge I/O eventually blocking all other services too.

Here are other people mentioning the same:

https://www.virtualmin.com/node/25864

https://www.virtualmin.com/node/22024

What I did to resolve that is to completely disable the built-in logrotate Virtualmin module, remove all the per-domain logrotate configuration files and create a single logrotate configuration file to rotate all the domains' logs at once:

/etc/logrotate.d/virtualmin

/var/log/virtualmin/*log {
    rotate 5
    daily
    missingok
    notifempty
    sharedscripts
    delaycompress
    postrotate
        /sbin/service httpd reload > /dev/null 2>/dev/null || true
    endscript
}

This solution works as expected - we have only one graceful restart after all the domains logs are rotated.

I have few questions though:

  1. Why you choose to create per-domain logrotate configuration at first place? There are a lot of arguments against that:

- All the domain log files go in /var/log/virtualmin it is very easy to create one wildcard logrotate configuration for all of them - There are no per-domain logrotate settings, there are though per plan settings (you can customize logrotate directives per plan), but it does not make much sense, basically all the domains logrotate config files are the same, except the filename - Even though before the graceful restart Apache keeps filling the old log file and this might take few seconds to complete for all domains, this is not a big deal if using delaycompress, which keeps the first archived file uncomressed. Basically Apache fill confitue writing to the old file for few more seconds, but no messages will be lost.

  1. Will I run into some incompatibility/restrictions if I keep the logrotate Virtualmin module disabled? Is it only making these per-domain logrotate configurations or it is also responsible for other operations too? IS something depending on the logrotate module?

  2. If you wish to keep per domain config files you can remove the graceful restart from the per domain configs, and add it to the main config after all of them. this will achieve the same. Unfortunately the interface does not allow to create custom logrotate directives if there is no postrotate directive (presuming graceful restart is mandatory there).

Let me know what you think!

Status: 
Active

Comments

Howdy -- I talked to Jamie about this a bit, and it should indeed be possible now that we've moved the logs from $HOME/logs/ to /var/logs/virtualmin/.

We haven't yet decided whether we're going to implement that by default, though we'd like to offer you assistance in setting that up, and maybe we can see how that's working for you.

It shouldn'f be a problem to disable the Virtualmin logrotate module. It would mean that you can't have per-domain logrotation configs, but that may not come up all that often.

Now, if you wanted to continue using the logrotation module, currently Virtualmin enforces that any custom logrotate template have a postrotate block. However, you could work around this by setting it to:

postrotate
/bin/true
endscript

What are your thoughts on those two methods of rotating logs? Does one stand out to you as preferable in your environment?

Howdy andreychek,

Thanks a lot for the time spent on this issue!

I decided to stay with one wildcard configuration for all logs in /var/log/virtualmin. Here is my arguments for that:

Logrotation depends primarily on the space available for the log files and then on the Apache server itself. These are both shared resources and should be planned and controlled by the master server admin. For example if there is not enough space for log files, the master server admin might change the logrotation settings to only keep two weeks of logging. Or if there is plenty of space he/she might decide to turn off the gzip in order to save CPU work. On the the other hand, if the virtual server owner have control over the logrotation for his virtual server he/she might decide to keep the logs forever (or too long) eventually filling all the shared space and compromising the whole server. That's why I think per domain logrotation control is not a good idea.

If the logrotation of /var/log/virtualmin is controlled by the master server admin then it is much more convenient the configuration to be in one single file, eventually controlled from the Virtualmin interface.

If a custom logrotation created by the virtual server owner is needed for some files in the users home directory, it can be achieved by custom cron job run under the server owner's user and limited to his home dir.

Here are the exact steps to switch to one logrotation config:

  1. Disable the logrotate feature for all existing virtual servers with command line virtualmin disable-feature --logrotate --all-domains. When doing this the Virtualmin disable-feature script does not delete the per-domain logrotate configuration file in /etc/logrotate.d/ but instead makes them empty (with size 0). You can delete them with find /etc/logrotate.d -size  0 -print0 | xargs -0 rm
  2. Disable the logrotate plugin from System Settings -> Features and Plugins page
  3. Create a file /etc/logrotate.d/virtualmin with content (I have changed the settings from the previous post, they were for testing only): /var/log/virtualmin/*log {
        rotate 5
        weekly
        missingok
        notifempty
        sharedscripts
        compress
        delaycompress
        postrotate
            /sbin/service httpd reload > /dev/null 2>/dev/null || true
        endscript
    }

I have tested this configuration and it works as expected. There is only one graceful restart after rotating all of the logs in /var/log/virtualmin and also the first rotated file stay unzipped (just renamed) so Apache keeps writing there until the graceful restart (the gzip of the older files takes place first and can take up to 30 minutes). No messages are lost.

It also makes sense to combine /etc/logrotate.d/virtualmin with /etc/logrotate.d/httpd in one single file rotating all the Apache related logs at once with one single graceful restart.

Something I noticed is that the gzip of the old files is very heavy on I/O and the hanging of the server when having per domain logrotate configs and multiple graceful restarts is actually result of heavy gzipping and graceful restarts combined - Apache needs to reread all the website files, but the I/O is already under pressure from the gzip process.

In CentOS 6 /etc/cron.daily (where the logrotate is) jobs are invoked by anacron in /etc/anacrontab:

#period in days   delay in minutes   job-identifier   command
1                 5                  cron.daily       nice run-parts /etc/cron.daily
7                 25                 cron.weekly      nice run-parts /etc/cron.weekly
@monthly          45                 cron.monthly     nice run-parts /etc/cron.monthly

I have added ionice -c3 (Idle class) in order to keep the daily jobs very low priority (including the logrotate):

#period in days   delay in minutes   job-identifier   command
1                 5                  cron.daily       ionice -c3 nice run-parts /etc/cron.daily
7                 25                 cron.weekly      ionice -c3 nice run-parts /etc/cron.weekly
@monthly          45                 cron.monthly     ionice -c3 nice run-parts /etc/cron.monthly

It takes more time to gzip the old logs but almost does not affect the prformance of the server. I run the Virtualmin backups on idle class too.

Another thing I have noticed is that if a virtual server is deleted, his access and error log files are deleted, but not all the old, already rotated and zipped log files. Eventually /var/log/virtualmin gets filled with old logrotated files of already deleted virtual servers. They can be cleaned by a script that deletes all files older than 3 months for example, manually, but much better the Virtualmin script that deletes a virtual server should also delete the old logrotated logs for this server.

And yet another thing I have found, probably a separate issue: When the Virtualmin scheduled backup starts it kind of caches the virtual serves settings (like enabled features and IP address) for all the virtual servers. Because the backup takes quite a while (full backup on 400 websites takes 7 hours) this creates two issues:

  1. If a feature is removed from a virtual server after the backup started (even though the backup for this particular server did not started yet) this feature fails during the backup (because the files are already removed for example). If a feature is added, it is not backed up (because the backup script does not know it was added)
  2. Sometimes the settings for the virtual servers are reverted back when the backup completes if they are changed during the backup. For example when I disabled the --logrotate feature for all the virtual servers from the command line (see above) the backup was running and few minutes after for some of the virtual servers the logrotate feature was reverted back to enabled. The same happens if you change the virtual server's IP - when the backup completes the virtual server's IP is reverted back to the previous setting, but just in the settings kept in Virtualmin, the actual configuration files (dns zone, apache config) still show the new, correct IP.

I think this happens because the backup script reads the settings for all the virtual server's settings when it starts, then when it completes a virtual server backup it saves them back, eventually reverting any changes made during this time. Probably the backup script should read the virtual server settings for each particular virtual server just before to start this particular backup and then save back only the backup related features and not the IP for example.

Should I fill a separate issue with this one?

Thanks a lot guys for all the support, I will be happy to help improving Virtualmin. I will let you know how the new logrotation setting works over time.

Let me know what you think.

Georgi

I'll respond to the first part of your recent comment shortly -- however, I did want to mention that the two issues you're seeing during your backup -- it would be great if you could open a separate request for that.

Then we'll keep that separate from this topic, and it'll be easier to work with Jamie regarding what might be going on (we'll need his help to sort that out). Thanks!

Sure andreychek, I am filling separate issue report for the backups reverting veirtual servers settings.

Thanks a lot!