We've bought a 4 node rack server and I'm wondering how to set it up with regard to Virtualmin.
To take advantage of all the nodes, there'll be load balancing - with something like HAProxy, for example - to distribute the incoming traffic amongst the nodes.
The thing is, each of the nodes needs to be identical. All the websites / emails / etc. the same, so that it doesn't matter which node the load balancer sends the request to, it can all be satisfied by any and all of them equally well.
This means configuring the same settings for each node and then having the exact same virtual servers and "/home" directories on them all. Of course, this isn't just read but also write too, so websites will be changed and emails sent and received, changing the data in these "/home" directories - so if the data is duplicated for each node, then it also needs to somehow be kept in sync between them too.
The obvious thing, then, is to have a SAN for the network. A rack with NAS drives, let's say. Then all the virtual server "/home" directories could be on a shared network drive, auto-mounted to "/home" on each of the nodes, so they all see the same thing and the data inherently remains in sync.
That's the theory. The question is, can Virtualmin be made to work this way? For each node, the "/home" directory could actually be auto-mounted on boot to the NAS drive, so they're all actually seeing the exact same directory. Also, I guess, I'd need the "sites-available" in the Apache "etc" directory. Indeed, maybe the "/etc" directory should also be auto-mounted in the same way - that'll ensure the configurations are all the same too - and then I'm lead towards the notion that perhaps the whole file system ought to be on the NAS drive and we PXE boot the nodes (but all changes still need to go back to the NAS drive).
The thing is, in comes the traffic, I load balance between the nodes - so splitting it apart - but then having them all access the NAS drive on the SAN is bottlenecking it again. So would I be throwing away the advantage gained from load balancing between 4 servers, if it all gets serialised again accessing the SAN? Well, I guess it's a "CPU bound vs. I/O bound" thing. Processing PHP and such is CPU, but largely what a web server is often doing is just spooling data. Well, literally serving it up, hence the name "server". Which is more i/O bound and, thus, potentially bottlenecking the gains from having multiple nodes.
Mind you, each node can have its own local hard drives. Is there any way to perhaps have those act as caches for the shared NAS drive? Pull down the data from the NAS drive on first access but keep it on the hard drive and just serve from that. Unless data changes, then write it back to the NAS drive. Thus, only on a "cache miss" does the node actually go all the way to the NAS drive, preferring the local copy first. If anything changes on that local copy - a write, not just a read - then it can be pushed back to the NAS (even potentially lazily, when the node has a bit of idle time on its hands).
I have absolutely no idea if that's even possible, but it logically would make the most sense. If it is possible, then would it all still work perfectly happily with Virtualmin?
So many questions, but I've never architected a system like this before. I'm having to learn rapidly as I go along. :D