BUG+Solution: Nginx module site-config backup/edit/delete chews up configs

The Problem

Nginx configurations are almost always edited by hand to add htaccess-style directives to sites. Moreover, it is extremely common to have a single server take up more than one server{} block, for redirect reasons (i.e. one block listening to www.domain.com on port 80/443 and redirecting to https://domain.com, another block listening to domain.com on port 80 and redirecting to https://domain.com, and finally one block listening to domain.com on port 443 - hosting the actual server), or when doing complex setups (i.e. port 80 of 3 different domains all redirect you to https://domain1.com, but each domain has its own unique site on port 443).

If Nginx is set to use separate per-site config files (in sites-available), these heavily edited configs are improperly parsed by Virtualmin (backup/edit/delete sites) - only a handful of lines are backed up.

If Nginx is set to write all servers to the main Nginx.conf, things are even more hopelessly broken.

The New Algorithm

  • If Nginx is setup to use a folder of per-site config files, then back up/restore the complete site file, untouched. Never even attempt to parse it.
  • If Nginx/Virtualmin is in its default mode of putting everything in a single main config file, use my new algorithm to identify and extract every server{} block related to the server, thus including all edits that have been added to the domain, untouched.

The latter is the only harder part, but that single-file nginx.conf scenario can be solved with this algorithm:

 step 1: find each server{ block start-position (as int-offsets into the main nginx.conf-file) using the following regex:

    /\bserver\s*\{/

 step 2: extract all individual server {} blocks using a simple nesting-count algorithm:

    foreach server_start_pos
         pos = server_start_pos
         openbr = 0
         while pos < filesize
             chr = input[pos]
             if chr == "{"
                openbr++;
             elseif chr == "}" {
                openbr--;
                if openbr == 0 {
                   // we have found the closing } for the server{} block
                   server_end_pos = pos
                   // store the entire server{} text block in an array
                   copy_server_block_to_array(server_start_pos, server_end_pos)
                }
             }

 step 3: loop through all of the server{} blocks you have extracted from the main nginx.conf file,
 and look for every server{} block that belongs to the domain name we are backing up/editing/deleting.

     $dom = regex-escaped domain name string, i.e. "mydomain\.com", and NO "www\." prefix(!)
     foreach server_blocks as server_block_text
         test if server_block_text contains /\bserver_name\b[^;]*?\s(?:www\.)?$dom[\s;]/
             // if true: this is a server{} block with data for the domain we desire.
             // so extract this block, and then keep scanning for more matching blocks...

   This crazy section: \bserver_name\b[^;]*?\s(?:www\.)?$dom[\s;] just enforces boundaries and validity and ensures that
   domain\.com doesn't give false positives for similar domains like "server_name mydomain.com;".
   It also guarantees that a search for example.com matches a block for www.example.com.

       broken down:
           \bserver_name\b = enforced boundaries to ensure that we are looking at the correct property
           [^;]*? = allows there to be other domain-names preceding the one we are looking for (non-greedy)
           \s = ensures that there is whitespace before the domain we are searching for; if we had used \b for word boundary instead,
                then sub1.example.com would have matched during a search for example\.com
           (?:www\.)?$dom = match both www.example.com and example.com
           [\s;] = ensure that there is either whitespace or the closing ; after our desired domain name,
                again to ensure that "example.com.tv" won't give a false-positive during a search for example\.com

 step 4: now that we've found the start/end offsets of every server{} block related to the domain,
 they should be handled as follows:

     if backing up:
        concatenate all found server{}-blocks that belong to the domain, i.e.
        "server{...}\nserver{...}\nserver{...}" and save that to a single backup file.

     if editing:
        upgrade the edit-interface to support understanding multiple blocks -
        by making a new gui editor that literally shows all X related server{} blocks
        and their various editable property fields - or alternatively giving up and showing
        "this is a complex domain with more than one server{} block - we won't even
        attempt to understand what you are doing, go edit it manually instead,
        to be sure that our web gui doesn't corrupt things for you"
        the latter is a completely okay solution.

     if deleting:
        delete all of the found server{} blocks for that domain.

That is a great algorithm, but all is still not well...

Read post #2 for why.

Status: 
Active

Comments

This new algorithm is the only way to accurately find heavily-edited server{} blocks in the main nginx.conf file, by identifying the full extent of a domain's block no matter how heavily edited it is, and to also find all server{} blocks that have something to do with the domain.

But - even with this new algorithm, there is an issue that can never be solved for as long as we hold onto the idea that putting server{} configs in Nginx.conf is a "good idea" in the first place: Multiple, different domains can be sharing the same server{} block for various reasons.

If that's the case - then backing up/editing/deleting server{} blocks related to domainA, would lead to duplication/corruption/over-extensive deletion (respectively) from Nginx.conf, cloning/nuking/editing other domains in the process (domainB, domainC, ...).

Here's an example in the backup-case:

Imagine this potential scenario that could still arise even with my algorithm, given a more complex relationship between servers than just "one block = just one domain":

=== nginx.conf:

server { // the real server block for https://test1.com and https://test1.com
   listen 443 ssl;
   server_name test1.com www.test1.com;
   // ...
}
server { // sends port 80 traffic from test1.com, www.test1.com and www.test2.com to https://test2.com
   listen 80;
   server_name test1.com www.test1.com www.test2.com;
   return 301 https://test2.com$request_uri;
}
server { // the real server block for https://test2.com
   listen 443 ssl;
   server_name test2.com;
   // ...
}

Now the user backs up "test.com" and "test2.com". My algorithm (which, DESPITE THIS PROBLEM, is as good as it gets; better parsing cannot be done since we have no more knowledge than what the flawed, intermixed Nginx.conf format provides us) would end up with these backup files:

=== test1.com-nginx-backup:

server { // the real server block for https://test1.com and https://test1.com
   listen 443 ssl;
   server_name test1.com www.test1.com;
   // ...
}
server { // sends port 80 traffic from test1.com, www.test1.com and www.test2.com to https://test2.com
   listen 80;
   server_name test1.com www.test1.com www.test2.com;
   return 301 https://test2.com$request_uri;
}

=== test2.com-nginx-backup:

server { // sends port 80 traffic from test1.com, www.test1.com and www.test2.com to https://test2.com
   listen 80;
   server_name test1.com www.test1.com www.test2.com;
   return 301 https://test2.com$request_uri;
}
server { // the real server block for https://test2.com
   listen 443 ssl;
   server_name test2.com;
   // ...
}

Finally, the user restores both backups (doesn't matter if they do it into nginx.conf or sites-available at this stage, because the irrecoverable cloning damage has already been done):

=== nginx.conf:

server { // the real server block for https://test1.com and https://test1.com
   listen 443 ssl;
   server_name test1.com www.test1.com;
   // ...
}
server { // sends port 80 traffic from test1.com, www.test1.com and www.test2.com to https://test2.com
   listen 80;
   server_name test1.com www.test1.com www.test2.com;
   return 301 https://test2.com$request_uri;
}
server { // sends port 80 traffic from test1.com, www.test1.com and www.test2.com to https://test2.com
   listen 80;
   server_name test1.com www.test1.com www.test2.com;
   return 301 https://test2.com$request_uri;
}
server { // the real server block for https://test2.com
   listen 443 ssl;
   server_name test2.com;
   // ...
}

This is completely unavoidable for as long as people are allowed to put servers into nginx.conf. But if that ugly config-mode was removed from Virtualmin, and sites-available was enforced, backups become as clear and simple as "cp sites-available/test1.com ..." - without ever intermixing data, getting partial backups, corrupt backups, losing comments, or any other things that can happen when trying to parse nginx.conf. It won't matter how complex sites are; their configs will always be perfectly backed up.

You'll have to think about this of course; but what is more important? Preserving an ugly "insert into nginx.conf" mode of operation which nobody is recommended to use, or actually getting clean separation and always-perfect, non-corrupt backups?

Either way, if you maintain nginx.conf-mode support, you should at least use the proposed new algorithm. It will perfectly find every server{} block that relates to any given domain name, and the issue I just described will only arise if someone has set up a very complex relationship between domains, with multiple overlapping domains in certain server blocks (I do). If someone does have such a complex setup, they are unlikely to still be using the messy, newbie "nginx.conf"-mode (I don't), and this bug-waiting-to-happen of course won't hit them. Still, the bug will be there waiting for some user to come along with just such an edge-case setup... ;-)

In my opinion, unavoidable configuration corruption is at the top of things that must take precedence, even if it means dropping nginx.conf-mode.

As good as the algorithm is, there is no way to redesign it to intelligently figure out which server-configfile is more in need of a shared, multi-domain server{} block. Who's to say which domain deserves the block more? And if one domain-backup gets it and the other doesn't, then that'd break restores if only the domain without it was restored, etc. But - the light at the end of the tunnel - drop nginx.conf-mode and everything just works itself out naturally - the user makes the decision which config file deserves the conflicting server{} block! No more ambiguity or parsing errors ever again.

The best solution of all? It's to not even "play the game" at all: Remove support for nginx.conf-based server{} blocks. No algorithms needed.

Deleted all subsequent posts, and rewrote everything into post #1 and #2 to combine everything into an easier to read format.

I think a simple first step to fixing this would be to just backup the whole .conf file for the domain from the sites-available directory. That way multiple or oddly-formatted server blocks will be preserved.

This ticket is now restructured and rewritten, because it was difficult to read. You'll have to re-read it top to bottom, but this time it'll actually flow naturally and make sense to you. ;-)

Actually, forget deprecating Nginx.conf mode. I have a better idea.

Keep Nginx.conf-mode as a valid mode of operation, implement my new algorithm, and finally implement a third check which is a sanity-check that says: "If nginx.conf-mode is used, none of the found server{} blocks are allowed to have anything except completely related ${DOM} or www.${DOM} server_name entries - NO mixing of multiple different domains allowed. In case they've mixed things, then give the user a warning and force them to migrate to per-site configs."

I like this option a lot, as it won't harm existing users: Most users that use Nginx.conf mode will never have that advanced configs (that they'd mix domains), so my algorithm will cover them perfectly. For the very rare ones that do have complex relationships, they're going to be smart enough to know what your Nginx module means when it says "Okay dude, slow down... I am getting confused here. You are going to have to migrate all your sites out of Nginx.conf because you are doing some complex shizzlewizzle here and I don't know how to tell which server{} blocks belong to which sites."

And of course, all of this algorithm-talk only relates to the Nginx.conf-mode. In per-site mode, just back up/edit/delete the exact config file that's named /etc/nginx/sites-available/${DOM} - do NOT scan/parse/edit/delete/do anything to ANY other configs in that folder. Just go DIRECTLY to the one with the exact sitename. No parsing allowed - cuz that way be' dragons and false positives.

I've been standing by for 3 days and checking back to see if you have any questions that arise. Unfortunately I have to leave on a business trip now and will be gone for two weeks, so all I can do is briefly summarize everything instead, and pre-emptively cover everything you might ask about:

Nginx.conf mode issues:

  • The problem in Nginx.conf-mode is that nginx configs can (and extremely often do) consist of more than one server{} block per server.
  • Moreover, nginx site configurations do the entire job of .htaccess files, so it's never enough to just back up some of the lines. All lines in every server{} block must be preserved.
  • No assumptions must ever be made about the format/order of lines in server{} blocks, since users might have heavily edited them.
  • The algorithm I provided allows you to find every server{} block for a given domain, in Nginx.conf-mode.
  • A further problem is that complex setups might have more than one UNRELATED domain names per server{} blocks, which would make Nginx.conf-extraction impossible, since it wouldn't be possible to determine which domain the server{} block works for.

The solutions:

  • If Nginx is set to per-site config mode, then just back up/edit the actual /sites-available/${DOM} file directly - ZERO parsing, to avoid to corruption, and zero scanning of other config files. Stick to the one with the exact name.
  • If Nginx is set to Nginx.conf mode, then use my algorithm to find every server{} block for the given domain. Moreover, you must scan the server_names tag of each such found server block to make sure it is NOT a cross-domain block (i.e. NOT a "server_name example1.com example2.com" - all that is legal is "server_name ${DOM} and/or www.${DOM}".
  • When it comes to restoring a config from backup, the IPs and php-socket lines might have to be adapted to the new server.

That's ALL the info you need.

I have to leave now and wish you the best of luck. Thanks to the already-provided algorithm, you don't have to do much work to fix these bugs.

I think what we'll do as a first step is implement backing up of the whole .conf for for a domain, if per-domain files are in use. That solves the issue for 99% of users, without getting into the complexity of figuring out which server blocks belong to which domains.

Hi Jamie, I'm back from the business meeting/trip.

Yes, I agree that the most important step is just to simply back up the whole .conf as-is for every domain. Almost anyone that edits their Nginx files manually or has multiple server blocks will have switched to per-site files for management reasons, so that one small change would indeed cover 99% of users.

The rest can be covered with the provided algorithm as a starting point whenever you have time to analyze it and try it out. (I've already tested it and the regexps; it works very well.)

Bumping the ticket so it doesn't fall off the board, since this is quite a serious flaw in the Nginx module. Almost any website you can imagine today needs multiple modifications to the server{} block to convert .htaccess rules to Nginx rules to make the site scripts work, and currently the process of doing so corrupts the configuration during site backup/restore. So I'd say that this is serious.

Not in any hurry with this at all myself (can always restore the uncorrupt configs via the latest full filesystem backup) but just want to make sure the ticket doesn't fall away, that's all.

I doubt that you've forgotten about this issue, but still, just a harmless bump just in case.

A code change to backup the whole .conf file for the domain has been implemented, and should be rolled out already in an update to the Nginx plugin for Virtualmin.