Server consolidating disks

For each test, two parameters have been measured: a network ping executed each second, and the effective stun time read from VMware logs. Promoted features like VVOLs and Multi-Processor FT have made the news during the launch of v Sphere 6.0, and later the issues with CBT (change block tracking) have been the most talked topic, and all contributed to overlook this improvement.

The first gives you the result has it can be seen from another application or user over the network connecting to the VM, while the second gives you the exact time spend by VMware engine to manage snapshot creation and consolidation. But with CBT issues now finally solved, if you were still looking for reasons to upgrade to v Sphere 6.0, this one alone in my opinion would be more then enough to justify an upgrade of all my ESXi servers to 6.0.

Reply from 192.1: bytes=32 time=1ms TTL=62 Reply from 192.1: bytes=32 time=2ms TTL=62 Reply from 192.1: bytes=32 time=1ms TTL=62 Request timed out. Reply from 192.1: bytes=32 time=3ms TTL=62 Reply from 192.1: bytes=32 time=22ms TTL=62 Reply from 192.1: bytes=32 time=2ms TTL=62 Reply from 192.1: bytes=32 time=2ms TTL=62 2016-02-05T.783Z| vcpu-0| I120: Checkpoint_Unstun: vm stopped for 843578 us 2016-02-05T.633Z| vcpu-0| I120: Checkpoint_Unstun: vm stopped for 1224454 us 2016-02-05T.457Z| vcpu-0| I120: Checkpoint_Unstun: vm stopped for 1180217 us 2016-02-05T.193Z| vcpu-0| I120: Checkpoint_Unstun: vm stopped for 1125963 us 2016-02-05T.870Z| vcpu-0| I120: Checkpoint_Unstun: vm stopped for 1083360 us 2016-02-05T.517Z| vcpu-0| I120: Checkpoint_Unstun: vm stopped for 1061894 us 2016-02-05T.124Z| vcpu-0| I120: Checkpoint_Unstun: vm stopped for 1014733 us 2016-02-05T.623Z| vcpu-0| I120: Checkpoint_Unstun: vm stopped for 967550 us 2016-02-05T.131Z| vcpu-0| I120: Checkpoint_Unstun: vm stopped for 1004541 us 2016-02-05T.600Z| vcpu-0| I120: Checkpoint_Unstun: vm stopped for 961571 us 2016-02-05T.980Z| vcpu-0| I120: Checkpoint_Unstun: vm stopped for 907337 us 2016-02-05T.338Z| vcpu-0| I120: Checkpoint_Unstun: vm stopped for 890165 us 2016-02-05T.162Z| vcpu-0| I120: Checkpoint_Unstun: vm stopped for 383551 us You start to see the issue arising already.

There have been multiple stun operations, and the sum of them all is 12648914 microseconds, or 12,6 seconds. As a result of this, multiple pings have been lost: Reply from 192.1: bytes=32 time=1ms TTL=62 Reply from 192.1: bytes=32 time=2ms TTL=62 Request timed out.

One should now hopefully see snapshot consolidations completing in 1 pass (with minimal or indeed no helper disks) and with hopefully a dramatically shorter stun time, and a much small chance of consolidation failure.

This sounded too good to be true, so my colleague Tom Sightler, Solutions Architect in Veeam, decided it was time to run some tests to prove this change.

Tom shared with me his findings, so thanks Tom for the reports!

server consolidating disks-4

It has 12 virtual disks, 16GB each, all stored in the NFS Datastore – 1 disk for the Operating System, while the other 11 disks are just attached, but not even formatted or mounted. Because the biggest problems for snapshot consolidation is observed with customers that have lots of VMDKs because v Sphere consolidates multiple disks and their snapshots of the same VM sequentially. As a result of this, availability of this VM over the network has also dramatically improved: Reply from 192.1: bytes=32 time=2ms TTL=62 Reply from 192.1: bytes=32 time=2ms TTL=62 Reply from 192.1: bytes=32 time=1ms TTL=62 Request timed out.Before v Sphere 6.0, the consolidation and commit phases of any VM snapshot has always followed the same procedure: an additional helper snapshot was created to “freeze” not just the base virtual disk, but also the snapshot disk, and once the changes stored in the snapshot disk have been merged into the base disk, the helper snapshot was also committed, and at some point the I/O was not directed anymore to the snapshot, but back to the original disk.This process sounds simple, but once you start looking at the details are not that easy, and in some production environments this behaviour have caused problems.Snapshot consolidation (or commit) operations in VMware v Sphere have always been a problem, especially for large and really active virtual machines.But v Sphere 6 has introduced some changes that are probably going to make commits a problem of the past!

Leave a Reply