Can't open all VMs at startup, tap interfaces mixing

I have 9 KVM instances, all of which are on autoboot.

When I boot the cloudmin server I get the 3 first machines running, the last one being loredana.lan, then it stops on the 4th one, mysql.lan.

If I look in the kvm directory, the file mysql.lan-eth0.tap contains the same name as the one in loredana.lan-eth0.tap and therefore doesn't start, and none of the following VMs are starting.

If then I reboot manually mysql.lan from Cloudmin, it will take a free interface and run normally.

If I reboot all of the remaining servers at once, only one of them will start normally, as the others will have taken the same interface name.

Could it be because the VMs are too long to start up?

I see the delay is 10 sec between each VM. Is there any way to configure it? Is there any way to configure the order in which the VMs start up?

Many thanks for your help,

Thomas

Status: 
Active

Comments

Category: Bug report » Support request

Yes, this could be due to multiple VMs starting at the same time at boot and being assigned the same tap interface. Although it is KVM that does that assignment, so I would have expected it to do the right thing.

One work-around may be to edit /etc/init.d/cloudmin-kvm and increase the sleep times between VM startups.

We have now 3 different installations of Cloudmin, and they all mess up the VM's network at boot, and we have regular crash of the cloudmin host (physical machine) at boot.

Of these installations, we have one in the office on the local network, and 2 in datacenters, all of them on Ubuntu 16.04 LTS using KVM for virtualization. If you're interested to see the config we can give you root access to one of these servers because we are a bit desperate now, and don't know where to look at.

The last error we got was:

qemu-system-x86_64: -vnc :3,password: Failed to start VNC server: Failed to bind
 socket: Address already in use

And in /etc/init.d/cloudmin-kvm we found that 2 VMs had the same vnc:3 so I changed one of them to vnc:1 (which wasn't used), and rebooted and the physical machine crashed (couldn't boot). I rebooted it from the datacenter control panel, and this time it works fine, with all autoboot machines on. I reboot it again and after 2 minutes up, the cloudmin host machine goes in crash again.

It's not that I am passionate about rebooting machines, but I just wanna be sure they will be up if I reboot!

This has happened with all our installations. I have the impression that the generation of cloudmin-kvm writes contradictory informations.

I am investigating and will paste the different versions of the file, and when they crash our system, but if ever you're interested to have a look you are more than welcome!

Thanks

We have a lead...

We created first 4 empty machines with autoboot. Then for some reasons we have put only hard drives on 3 machines and removed the autoboot to the forth machine. My previous test with vnc :1 crashed the machine so I thought maybe it was used by Cloudmin itself and I changed it to vnc :4, and it worked!

Here is the file corresponding to that moment:

    (sleep 40 ; ionice -n0 /usr/bin/kvm -name ns1\.cloudmin\.dedibox\.fr -m 512 -drive file=/data/kvm/ns1.cloudmin.dedibox.fr.img,media=disk,index=0,if=ide -drive file=/data/kvm/ns1.cloudmin.dedibox.fr.swap,media=disk,cache=off,index=1,if=ide -boot c -net tap,vlan=0,script=/data/kvm/ns1.cloudmin.dedibox.fr-eth0.sh -net nic,vlan=0,macaddr=52:54:00:00:71:94,model=virtio -vnc :4,password -usbdevice tablet -monitor tcp:127.0.0.1:40000,server -smp 1) >/data/kvm/ns1.cloudmin.dedibox.fr.console 2>&1 </dev/null & sleep 3 ; /usr/bin/nc.pl --sleep 3 127.0.0.1 40000 </data/kvm/ns1.cloudmin.dedibox.fr.monitor >>/data/kvm/ns1.cloudmin.dedibox.fr.console 2>&1

    (sleep 0 ; ionice -n0 /usr/bin/kvm -name db\.cloudmin\.dedibox\.fr -m 2048 -drive file=/data/kvm/db.cloudmin.dedibox.fr.img,media=disk,index=0,if=ide -drive media=disk,cache=off,file=/data/kvm/db.cloudmin.dedibox.fr.swap,index=1,if=ide -boot c -net tap,vlan=0,script=/data/kvm/db.cloudmin.dedibox.fr-eth0.sh -net nic,vlan=0,macaddr=52:54:00:00:71:96,model=virtio -vnc :2,password -usbdevice tablet -monitor tcp:127.0.0.1:40001,server -smp 1) >/data/kvm/db.cloudmin.dedibox.fr.console 2>&1 </dev/null & sleep 3 ; /usr/bin/nc.pl --sleep 3 127.0.0.1 40001 </data/kvm/db.cloudmin.dedibox.fr.monitor >>/data/kvm/db.cloudmin.dedibox.fr.console 2>&1

    (sleep 10 ; ionice -n0 /usr/bin/kvm -name bbn\.cloudmin\.dedibox\.fr -m 1024 -drive file=/data/kvm/bbn.cloudmin.dedibox.fr.img,media=disk,index=0,if=ide -drive cache=off,media=disk,file=/data/kvm/bbn.cloudmin.dedibox.fr.swap,index=1,if=ide -boot c -net tap,vlan=0,script=/data/kvm/bbn.cloudmin.dedibox.fr-eth0.sh -net nic,vlan=0,macaddr=52:54:00:00:66:20,model=virtio -vnc :3,password -usbdevice tablet -monitor tcp:127.0.0.1:40002,server -smp 1) >/data/kvm/bbn.cloudmin.dedibox.fr.console 2>&1 </dev/null & sleep 3 ; /usr/bin/nc.pl --sleep 3 127.0.0.1 40002 </data/kvm/bbn.cloudmin.dedibox.fr.monitor >>/data/kvm/bbn.cloudmin.dedibox.fr.console 2>&1

You see how there is a "hole" 0 10 40?

Then we added the last drive and put the last machine on autoboot and here is the resulting file:

    (sleep 40 ; ionice -n0 /usr/bin/kvm -name ns1\.cloudmin\.dedibox\.fr -m 512 -drive file=/data/kvm/ns1.cloudmin.dedibox.fr.img,media=disk,index=0,if=ide -drive file=/data/kvm/ns1.cloudmin.dedibox.fr.swap,media=disk,cache=off,index=1,if=ide -boot c -net tap,vlan=0,script=/data/kvm/ns1.cloudmin.dedibox.fr-eth0.sh -net nic,vlan=0,macaddr=52:54:00:00:71:94,model=virtio -vnc :4,password -usbdevice tablet -monitor tcp:127.0.0.1:40000,server -smp 1) >/data/kvm/ns1.cloudmin.dedibox.fr.console 2>&1 </dev/null & sleep 3 ; /usr/bin/nc.pl --sleep 3 127.0.0.1 40000 </data/kvm/ns1.cloudmin.dedibox.fr.monitor >>/data/kvm/ns1.cloudmin.dedibox.fr.console 2>&1

    (sleep 0 ; ionice -n0 /usr/bin/kvm -name db\.cloudmin\.dedibox\.fr -m 2048 -drive file=/data/kvm/db.cloudmin.dedibox.fr.img,media=disk,index=0,if=ide -drive media=disk,cache=off,file=/data/kvm/db.cloudmin.dedibox.fr.swap,index=1,if=ide -boot c -net tap,vlan=0,script=/data/kvm/db.cloudmin.dedibox.fr-eth0.sh -net nic,vlan=0,macaddr=52:54:00:00:71:96,model=virtio -vnc :2,password -usbdevice tablet -monitor tcp:127.0.0.1:40001,server -smp 1) >/data/kvm/db.cloudmin.dedibox.fr.console 2>&1 </dev/null & sleep 3 ; /usr/bin/nc.pl --sleep 3 127.0.0.1 40001 </data/kvm/db.cloudmin.dedibox.fr.monitor >>/data/kvm/db.cloudmin.dedibox.fr.console 2>&1

    (sleep 10 ; ionice -n0 /usr/bin/kvm -name bbn\.cloudmin\.dedibox\.fr -m 1024 -drive file=/data/kvm/bbn.cloudmin.dedibox.fr.img,media=disk,index=0,if=ide -drive cache=off,media=disk,file=/data/kvm/bbn.cloudmin.dedibox.fr.swap,index=1,if=ide -boot c -net tap,vlan=0,script=/data/kvm/bbn.cloudmin.dedibox.fr-eth0.sh -net nic,vlan=0,macaddr=52:54:00:00:66:20,model=virtio -vnc :3,password -usbdevice tablet -monitor tcp:127.0.0.1:40002,server -smp 1) >/data/kvm/bbn.cloudmin.dedibox.fr.console 2>&1 </dev/null & sleep 3 ; /usr/bin/nc.pl --sleep 3 127.0.0.1 40002 </data/kvm/bbn.cloudmin.dedibox.fr.monitor >>/data/kvm/bbn.cloudmin.dedibox.fr.console 2>&1

    (sleep 30 ; ionice -n0 /usr/bin/kvm -name lasai\.cloudmin\.dedibox\.fr -m 1024 -drive file=/data/kvm/lasai.cloudmin.dedibox.fr.img,media=disk,index=0,if=ide -drive cache=off,media=disk,file=/data/kvm/lasai.cloudmin.dedibox.fr.swap,index=1,if=ide -boot c -net tap,vlan=0,script=/data/kvm/lasai.cloudmin.dedibox.fr-eth0.sh -net nic,vlan=0,macaddr=52:54:00:00:66:21,model=virtio -net tap,vlan=1,script=/data/kvm/lasai.cloudmin.dedibox.fr-eth1.sh -net nic,vlan=1,macaddr=02:54:00:AC:14:5C,model=virtio -vnc :1,password -usbdevice tablet -monitor tcp:127.0.0.1:40003,server -smp 1) >/data/kvm/lasai.cloudmin.dedibox.fr.console 2>&1 </dev/null & sleep 3 ; /usr/bin/nc.pl --sleep 3 127.0.0.1 40003 </data/kvm/lasai.cloudmin.dedibox.fr.monitor >>/data/kvm/lasai.cloudmin.dedibox.fr.console 2>&1

You can see that the new machine has retaken its spot at 30 (that it took when we made it autoboot the first time), and vnc :1, and now there is no problem. We have no time to go further in the testing today, the installation works, but we will dig it later, as it seems that disabling autoboot brings problems.

Category: Support request » Bug report