Enhance backup process by excluding non-standard files and directories in home directories

In line with the same purpose of the other proposed and accepted idea of excluding certain databases and tables to make Virtualmin Backup system more efficient, I'd like to file another feature request for excluding non-standard files and directories in home directories this time.

The rationale behind this request is that end-users tend to keep their non-standard files and directories in the home directories of their websites (i.d. in /home/username directory) which are included in automatic Backup Schedules by Virtualmin. It becomes especially problematic when end-users store very large compressed archives in the website's home directories, thus making Virtualmin to backup the same large archive files over and over again causing unnecessary loads to the system. Warning, explaining, educating end-users of why they should keep their home directories clean is really useless as they keep messing up their home directories.

So we would enhance greatly Virtualmin Backup system and save ton's of bandwidth for all Virtulamin users by making it compress only the standard directories in homes like, for example:

drwxr-x--- 2 username username 4096 Dec 20  2012 cgi-bin
drwxr-xr-x 9 username username 4096 Oct 25 00:00 domains
drwxr-xr-x 3 username username 4096 Mar 27  2015 etc
drwxr-xr-x 2 username username 4096 Mar 27  2015 fcgi-bin
drwxr-x--- 2 username username 4096 Mar 27  2015 logs
drwx------ 7 username username 4096 Nov 20  2013 Maildir
drwxr-x--- 9 username username 4096 Oct 26 17:11 public_html
drwxr-x--- 5 username username 4096 Oct 26 07:51 tmp

Alternatively, the Virtualmin Backup system could offer inclusion option for the users, so that they could schedule their backups per their likings. I know there is already "Files to exclude from each domain" option, however while it is good for individual websites and individual backup schedules, the opposite option like "Files or directories to include for each domain" would be better to cover most of the websites to make backups automatic.

Another idea is to provide a new "Include the document root only" (/home/username/public_html directory) next to the "Include homes directory" option. In this case we could leave all the dirt in the home directory of the website out of regular backups including only the files and subdirectories of the document root, which are necessary for the functionality of the websites.

Status: 
Active

Comments

I see where you are coming from, but I think that backing up only expected directories runs the risk of missing files that users really do care about but happened to place in the domain's home directory.

What specific types of files do you want to exclude? Maybe allowing the exclude line to contain patterns like *.gz would solve this better.

I see where you are coming from, but I think that backing up only expected directories runs the risk of missing files that users really do care about but happened to place in the domain's home directory.

I understand, but there could be lot's of cases when inclusion per user preferences would fit better than exclusion. Would providing inclusion besides already existing exclusion rule hurt? I don't think so.

What specific types of files do you want to exclude? Maybe allowing the exclude line to contain patterns like *.gz would solve this better.

Users keep their files in lot's of different formats, but yes very often in .gz, .tar, .zip, .sql formats. Will wildcard in exclusion field work? If yes, then we would love to have a way to set this kind of exclusion rule by default as we manage lot's of Virtualmin servers and it would be a disaster to manually write this kind of exclusion rule on each of them, especially when customers can change them or add new backup schedules any time. We need some kind of universal method to exclude large archive files in home directories.

Well, the trouble is that having too many options confuses people.

We take every request made here seriously, and always consider the things being asked. We highly appreciate your feedback.

However, in this case, we don't get many requests for what you're suggesting (you're the first, actually). So we'd like to see if there's a way to resolve the issue you're mentioning, without creating a new option on the screen.

I see. What about your comment:

Maybe allowing the exclude line to contain patterns like *.gz would solve this better.

Is it functional now or was it just an intention to start providing wildcards in exclusion field?

Exclusion by wildcard doesn't work currently, but it would be pretty easy to add.

Exclusion by wildcard doesn't work currently, but it would be pretty easy to add.

Could you please do so, so that we could exclude archive files? Thanks!

Good news - you can already enter wildcards into the list of files to exclude! Entering *.gz will skip all files with a .gz extension.

Good news - you can already enter wildcards into the list of files to exclude! Entering *.gz will skip all files with a .gz extension.

Excellent, thank you!

I tried different methods and this worked perfectly. Please hit "Like" if it works for you as well.

Here is a solution:

Case I: Single Directory Exclude a directory from backups:

/home/example/public_html/docs/


Then syntax in "Files to exclude from each domain" should be:

public_html/docs


Case II: Multiple Directories Exclude multiple directories from backups:

/home/example/public_html/docs/
/home/example/public_html/var/
/home/example/domains/testbackups.example.com/public_html/media/cache/


Then syntax in "Files to exclude from each domain" should be:

public_html/docs
public_html/var
public_html/media/cache


Case III: Exclude file extensions

Exclude .exe .tar .tar.gz :


Then syntax in "Files to exclude from each domain" should be:

*.exe
*.tar
*.tar.gz


Case IV: All above cases at once

public_html/docs
public_html/var
public_html/media/cache
*.exe
*.tar
*.tar.gz


Note: These exclude files/directories are applicable globally for all websites/subdomains included in the backups. If you want to exclude the files/directories for website specific then you need to create schedule different backups for each domain.