We use a mounted NFS share for backups for the highest efficiency. Why? Because both the FTP and SSH/SFTP modes first create a temporary tar file on the local filesystem of the Virtualmin server and then uploads that to the backup server. This means that for any backup process, there's a lot of work involved: "read data to be backed up+compress+write to tmp file, then read tmp file+write to destination server". By mounting the backup storage as a local folder instead, we're able to cut out all of that extra work and immediately write to the destination.
So far so good. The very serious issue:
- /mnt/backups on the Virtualmin system is a folder that's actually mounting an NFS share.
- Sometimes, the backup server might be down for maintenance, thus disabling the NFS share, meaning that /mnt/backups goes back to being just a regular folder on the / filesystem (of the Virtualmin server) again.
- If Virtualmin tries to perform a backup while the NFS share is unmounted, it will 1) think that all old backups are deleted since the /mnt/backups folder is now "empty", and 2) begin writing the new backup to the folder local / filesystem.
- This is very, very bad! Not only does it risk filling up the local filesystem with backup files. But more seriously - the next time that /mnt/backups mounts the NFS filesystem again, the old NFS contents of /mnt/backups will magically reappear, and any new backups that Virtualmin did while the NFS share were down will "vanish" since they're on another filesystem.
Idea for a solution:
- A checkbox in the "Backup Schedule" dialog of Virtualmin, where instead of typing "Backup destinations: Local file or directory = /mnt/backups", you choose "Backup destinations: Mounted filesystem = /mnt/backups".
- In this mode, before doing any backups, Virtualmin first runs the "mount" command to ensure that /mnt/backups has some filesystem mounted on it. If it doesn't, then the backup is deferred until later, the virtualmin admin is emailed with an error report saying that the backup server is down, and a cron job is scheduled to retry again later.
- It retries either infinitely or a set number of times, possibly with staggered intervals like first trying in 5 minutes, then if that fails wait another 20 minutes, then if that fails wait another 40 minutes, then 60 minutes, etc. This is the same strategy used by Postfix when trying to deliver emails to non-responding servers, and is very efficient, by first assuming that the server might be down briefly and therefore trying tight intervals, but then gradually slowing down when it becomes clear that the downtime will be long.
- Note: The /mnt/backups location of course has to be mounted using "autofs" for this to work, so that it reconnects automatically - otherwise it will just stay dismounted the whole time even after the server has come back online, and the deferred backup will never succeed.
The deferred processing is the most difficult part of this idea. Running the "mount" command to validate the mountpoint before proceeding and on failure deferring+emailing admin is easy - ensuring that the mountpoint is set up with "autofs" so that it'll actually be coming back online is easy - but actually rewriting how Virtualmin backups are scheduled to allow for deferred processing is tougher, mainly because of the need to avoid pileups in case "the next scheduled backup is hit while another earlier backup is still in the deferred queue". For that, it'd have to be more clever and only perform one of the deferred backups when it comes to each clashing scenario.
What are your thoughts? Any other ways this setup could be improved?