kvm snapshots cannot be reverted

Ubuntu 13,10 (64-bit Desktop) Cloudmin 4.04.gpl GPL

When trying to revert a snapshot of a cloudmin guest disk, there are only errors like this:

Reverting system to snapshot 20140218-winupdate .. .. failed : /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 42949607424: Input/output error /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 42949664768: Input/output error /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 0: Input/output error /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 4096: Input/output error Unable to merge invalidated snapshot LV "win-v1-tantalus.localhost_0_20140218-winupdate_snap"

any idea what's going on here? I'm not sure if this has something to do with cloudmin or just is a linux problem. Snapshot has been created using cloudmin from browser or via cmd line.

Creating guests using cloudmin+lvm works without problem.

some outputs from lvm commands:

root@lx-d0-midas:~# vgs
  /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 42949607424: Eingabe-/Ausgabefehler
  /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 42949664768: Eingabe-/Ausgabefehler
  /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 0: Eingabe-/Ausgabefehler
  /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 4096: Eingabe-/Ausgabefehler
  VG    #PV #LV #SN Attr   VSize   VFree
  vg0     1   7   1 wz--n- 237,41g  7,94g
  volg0   2  11   0 wz--n- 279,45g 74,59g
root@lx-d0-midas:~# pvs
  /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 42949607424: Eingabe-/Ausgabefehler
  /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 42949664768: Eingabe-/Ausgabefehler
  /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 0: Eingabe-/Ausgabefehler
  /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 4096: Eingabe-/Ausgabefehler
  PV         VG    Fmt  Attr PSize   PFree
  /dev/md1   vg0   lvm2 a--  237,41g  7,94g
  /dev/md127 volg0 lvm2 a--   74,53g 74,53g
  /dev/md2   volg0 lvm2 a--  204,92g 64,00m
root@lx-d0-midas:~# lvs
  /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 42949607424: Eingabe-/Ausgabefehler
  /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 42949664768: Eingabe-/Ausgabefehler
  /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 0: Eingabe-/Ausgabefehler
  /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 4096: Eingabe-/Ausgabefehler
  LV                                                  VG    Attr      LSize   Pool Origin                        Data%  Move Log Copy%  Convert
  home                                                vg0   -wi-ao---  74,50g                                                                 
  kvm2                                                vg0   -wi-ao---  70,00g                                                                 
  root                                                vg0   -wi-ao---  23,28g                                                                 
  swap                                                vg0   -wi-ao---   3,72g                                                                 
  var                                                 vg0   -wi-ao---  13,97g                                                                 
  win-v1-tantalus.localhost_0_20140218-winupdate_snap vg0   swi-I-s--   4,00g      win-v1-tantalus_localhost_img 100.00                       
  win-v1-tantalus_localhost_img                       vg0   owi-a-s--  40,00g                                                                 
  debian3_localhost_img                               volg0 -wi-a----   2,00g                                                                 
  debian60_localhost_img                              volg0 -wi-a----  30,00g                                                                 
...

cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 sdd2[1] sdb2[0]
      214878799 blocks super 1.2 [2/2] [UU]
     
md127 : active raid1 sdd1[1] sdb1[0]
      78156096 blocks [2/2] [UU]
     
md1 : active raid1 sdc2[1] sda2[0]
      248950592 blocks super 1.2 [2/2] [UU]
     
md0 : active raid1 sdc1[1] sda1[0]
      975296 blocks super 1.2 [2/2] [UU]

Thanx in advance Falko

Status: 
Active

Comments

Seems that I have found the reason for the problem.

When creating snapshots via cloudmin gui there is a formular like this:

New snapshot details Snapshot ID .... Percent of virtual disk to allocate XX % out of YY GB

I can easily overallocate the snapshot using more than the available free space in the volume group.

Having created a lvm snapshot which uses not existing vg space, the snapshot isn't used and the above mentioned errors occur.

So perhaps, when creating snapshots, the remaining space in the corresponding volume group must be taken in account and creating snapshots bigger than the remaining free space should not be possible.

Thanx Falko

I'm actually surprised that the snapshot LV creation didn't fail in that case. Are you sure that you actually allocated more space than was left in the VG?

Hello Jamie,

well, you're right and I'm sorry that I didn't try to reproduce the overallocation. But I've noticed that the "maximum usage" field didn't ever show other than 0 %, so this led me to this wrong conclusion.

The logs above (from creation date of this issue) show:

  PV         VG    Fmt  Attr PSize   PFree
  /dev/md1   vg0   lvm2 a--  237,41g  7,94g

root@lx-d0-midas:~# lvs
...
  /dev/vg0/win-v1-tantalus.localhost_0_20140218-winupdate_snap: read failed after 0 of 4096 at 4096: Eingabe-/Ausgabefehler
...
  win-v1-tantalus.localhost_0_20140218-winupdate_snap vg0   swi-I-s--   4,00g      win-v1-tantalus_localhost_img 100.00                       

So this means 100 % snapshot usage ... may be this is a reason of the error?

Now, when I've deleted the 20140218-winupdate_snap, and created a new "winupdate snap" using the lvm module in webmin, these sizes are shown:

vgs
  VG    #PV #LV #SN Attr   VSize   VFree
  vg0     1   8   2 wz--n- 237,41g  2,94g


  win-v1-tantalus.localhost_0_dailytests_snap vg0   swi-a-s--   4,00g      win-v1-tantalus_localhost_img  49,27                       
  win-v1-tantalus_localhost_img               vg0   owi-a-s--  40,00g                                                                 
  win-v1-tantalus_winupdates                  vg0   swi-a-s--   5,00g      win-v1-tantalus_localhost_img  39,41

Atm I have 5G+4G snapshots and 2,94G free.

So, when I try to reproduce this to create a snapshot using more than the free space, virtualmin shows correctly:

Creating snapshot of system win-v1-tantalus.localhost .. .. failed : Volume group "vg0" has insufficient free space (752 extents): 1024 required.

In the lvm module of webmin I see:

win-v1-tantalus.localhost_0_dailytests_snap 40 GB
win-v1-tantalus_localhost_img 40 GB
win-v1-tantalus_winupdates 40 GB

The last snapshot shows size 40 GB, Physical volumes allocated 5GB, Snapshot use percentage 39.41 % which is corresponding with the output from console. The dailytests snap shows size 40 GB, physical volumes allocated 4GB, Snapshot use percentage 49.27%.

So I'm trying to reproduce the 100% usage by installing something like office within the vm ...

After a short time now both snapshots show 100.00 usage, the errors are:

  lvs|grep win-v1
  /dev/vg0/win-v1-tantalus_winupdates: read failed after 0 of 4096 at 42949607424: Eingabe-/Ausgabefehler
  /dev/vg0/win-v1-tantalus_winupdates: read failed after 0 of 4096 at 42949664768: Eingabe-/Ausgabefehler
  /dev/vg0/win-v1-tantalus_winupdates: read failed after 0 of 4096 at 0: Eingabe-/Ausgabefehler
  /dev/vg0/win-v1-tantalus_winupdates: read failed after 0 of 4096 at 4096: Eingabe-/Ausgabefehler
  /dev/vg0/win-v1-tantalus.localhost_0_dailytests_snap: read failed after 0 of 4096 at 42949607424: Eingabe-/Ausgabefehler
  /dev/vg0/win-v1-tantalus.localhost_0_dailytests_snap: read failed after 0 of 4096 at 42949664768: Eingabe-/Ausgabefehler
  /dev/vg0/win-v1-tantalus.localhost_0_dailytests_snap: read failed after 0 of 4096 at 0: Eingabe-/Ausgabefehler
  /dev/vg0/win-v1-tantalus.localhost_0_dailytests_snap: read failed after 0 of 4096 at 4096: Eingabe-/Ausgabefehler
  win-v1-tantalus.localhost_0_dailytests_snap vg0   swi-I-s--   4,00g      win-v1-tantalus_localhost_img 100.00                       
  win-v1-tantalus_localhost_img               vg0   owi-aos--  40,00g                                                                 
  win-v1-tantalus_winupdates                  vg0   swi-I-s--   5,00g      win-v1-tantalus_localhost_img 100.00                       

In Cloudmin -> Disk Snapshots the "Maximum usage" field shows 0 instead of 100%. In Webmin -> lvm -> logical volume details shows: Current status Not in use

IMHO both are wrong...

May be a warning about full (and therefore destroyed) snapshots on "System Information" or "List Managed Systems" pages would make sense?

Best regards, Falko

Ok, it sounds like the snapshot got full and could not be restored. These should appear in red on the "List Snapshots" page already.

The percent size of a snapshot has to be large enough to store all the changes to the VM disk between when it was created and when it is restored. So if the snapshot is only 10% of the VM size and more than 10% of data on disk changes, it will not be restorable.