I have a Cloudmin Pro 10 VM license installed on my master system, host1. Host1 is a CentOS 6.3 kvm host and hosts ~7 kvm VMs all running Ubuntu 12.04 - all of this runs perfectly.
Recently we've added a 2nd host system, host2, again a CentOS 6.3 kvm host with a view to running another couple of VMs and providing some failover against host1 should it be required.
Both physical machines reside in the same cabinet in our DC, and are on the same subnet - let's say host1: 22.214.171.124 and host2: 126.96.36.199. Both have their gateway set to the DC gateway of 188.8.131.52 with no hardware firewall in between.
On each machine, I have 4 NICs that are bonded together to form a single interface, which is then bridged to allow the VMs to access the network. All of the VMs are online, and all of them can successfully ssh into the hosts without any delay.
Both systems can access the internet fine, and I can ssh into both systems from home without any issues. However, there is a real delay when attempting to ssh from host1 to host2 (or vice versa) and this obviously means that any action required on host2, that is controlled by host1 either takes forever or results in failure due to timeout.
In the interest of keeping this post short, I've put my ifcfg files into a pastie: http://pastie.org/8081648
I've tried both adding a firewall rule in each machine for the other, and also disabling the firewall entirely, so that can't be the issue.
I've tried troubleshooting this myself but can't seem to get to the bottom of it. Any help or advice would be appreciated.
Thanks in advance.
--------------------- SOLUTION ADDED 16 JULY ---------------------
For other who may have similar issues....
Believe it or not, this was all related to a mis-configured DNS address. The first DNS IP was actually wrong by a single digit, which obviously caused lookups to fail until it finally moved on to check the 2nd address.
So even though I tried to SSH from 1 server to the other using IP addresses only, it wants to check the reverse lookup of the source address upon connection, hence caused the huge delays.