Do you manage Hyper-V backup? Do you build backup solutions for Hyper-V? If so, then this article is a “must-read” for you, because Hyper-V backup is completely reinvented in Windows Server 2016 and later release.
In this article, we will walk you through what’s being the history and the journey of Hyper-V backup since the Virtual PC and Virtual Server days, and then we dive into the new innovations that started in Windows Server 2016 Hyper-V and later, so by the end of this article, you will have a better understanding of how Hyper-V backup has been evolved over the years.
In This Article
Virtual PC and Virtual Server
In 2004 Microsoft released the original Microsoft Virtual Server 2005, if you ever realized that Virtual Server 2005 didn’t support clustering, no support for checkpoints, no support for backup at all, and was a 64-bit host but 32-bit guest only.
At that time, Microsoft started working on the follow-up release which is Virtual Server 2005 R2, and halfway through, the leaders at Microsoft started contemplating building a brand new hypervisor Windows-based virtualization solution. There were intense architectural meetings, long discussions on finding a code name for the project, and eventually, Hyper-V was born.
In 2005, the DPM team started working with the Virtual Server engineering team to come up with the first implementation of agentless backup for virtual machines based on Virtual Server 2005 R2, at that time, the average system that was available in the market was 1 processor, dual-core system, and can run up to 6/7 virtual machines.
The average deployment was 3 to 4 virtual machines. The original implementation of backup on Virtual Server 2005 R2 was done while the Hyper-V team was busy getting Hyper-V up and running, then Hyper-V technical preview was released and customers started asking where is the backup support?
The Hyper-V team took the backup architecture that was built for Virtual Server 2005 R2 and they did the same for Hyper-V V1.
Windows Server 2008 R2 and 2012 Hyper-V
In order to understand how backup initially started in Hyper-V, we will explain the basics of Volume Shadow Copy Service (VSS) concepts.
Volume Shadow Copy Service (VSS) provides the system infrastructure for running VSS applications on Windows-based systems:
1) The VSS requester is any application that uses the VSS API to request the services of the Volume Shadow Copy Service to create and manage shadow copies and shadow copy sets of one or more volumes.
2) The VSS Writers are applications (Hyper-V) or services that store persistent information in files on disk and that provide the names and locations of these files to requesters by using the shadow copy interface.
3) The VSS Providers manage running volumes and create shadow copies of them on demand (storage).
What VSS does in the background is the following:
> Coordinates activities of providers, writers, and requesters in the creation and use of shadow copies.
> Furnishes the default system provider.
> Implements low-level driver functionality necessary for any provider to work.
Now in the physical computer, you have a backup application that comes in as VSS requester goes to the VSS system and request a backup of the system, VSS then goes and talk to all the writers on the system which are the various server applications they are installed including Windows components and tell them to get ready for backup, once they are ready for backup, then VSS goes and talk to the provider from the storage infrastructure and instruct to take a backup, and finally, that goes back to the original VSS requester (backup application), that is the basic workflow in the physical world.
This becomes really tricky once you get virtual machines into the picture because now you have multiple operating systems inside of operating systems.
So in Virtual Server 2005 R2 and in Windows Server 2008 R2 / 2012 Hyper-V, the backup workflow looks like as shown in Figure 1 below.
As you can see, we have the Hyper-V Host at the bottom, the Backup App comes and instruct to take a backup of the system place, the Backup App talks to VSS, then VSS detect there is a Writer on the system for virtualization in this case Virtual Server or Hyper-V, then VSS request Hyper-V can you get ready for backup?
Then Hyper-V uses the Integration Components and it reaches into the guest OS, in the guest we have VSS for Hyper-V Integration Components which is actually you can think of it as a lightweight Backup App, then VSS for Hyper-V Integration Components (IC) talks to VSS inside the guest and request to take a backup of the system place, then VSS goes and talks to all the various Writers inside the guest and request to get ready for backup when it’s done it returns back to VSS Hyper-V IC and says, can you take a system Guest Snapshot place? then VSS comes back and confirms, I am done.
Next, the VSS Hyper-V Integration Components inside the guest talks to the Hyper-V Writer on the host and confirm it’s done, the Hyper-V Writer then talks to VSS and says I am done as well, then the VSS on the host either use a Software Snapshot or if you have a VSS provider for a SAN it will use a Hardware Snapshot, at this stage a Physical Backup will take place.
What we have at this stage whether we are using a Software provider or Hardware provider, in the backup set we have a collection of VHDs each of which has its own snapshot, the final step is called backup/auto-recovery where they take the system snapshot and find all the VHDs that are stored in the collection, they took those VHDs and they mount them back into the Host operating system as disks, and lastly, they use VSS on the host to roll back to the Guest Snapshot that was taken. This allows getting a clean snapshot.
This was the architecture used in Virtual Server 2005 R2, Windows Server 2008 R2, and Windows Server 2012. The first issue with this architecture is not scalable, basically, this architecture worked reasonably well as long as you have a small number of virtual machines.
The second issue though was actually the mount/revert operation of the VHDs at the end of the backup process, as you scale up the number of virtual machines, the backup operation will become exponentially longer, because of all the plug and play mounting drives at the backend and roll them back each and every time, this became a very expensive and heavy operation.
Windows Server 2012 R2 Hyper-V
In Windows Server 2012 R2, Microsoft made a substantial change to the architecture where they had two primary goals, the first goal was to set the scene for Shielded virtual machines, and the second goal was to increase the reliability of backup.
In Windows Server 2012 R2 Hyper-V, the backup workflow looks like as shown in Figure 2 below.
As you can see, we have the Hyper-V Host at the bottom line, the Backup App comes and instruct to take a backup of the system place, the Backup App talks to VSS, then VSS detect there is a Writer on the system for virtualization, in this case, Hyper-V, then VSS ask Hyper-V can you get ready for backup?
Then Hyper-V uses the Integration Components and it reaches into the guest OS, in the guest we have VSS for Hyper-V Integration Components, then VSS for Hyper-V IC talks to VSS inside the guest and ask to take a backup of the system place, then VSS goes and talks to all the various Writers inside the guest and says get ready for backup.
Now here things get different, if we compare Figure 1 vs Figure 2, in Windows Server 2012 R2, Microsoft implemented the Hyper-V VSS Provider, which made the virtual hard disks look like they supported hardware snapshot.
The VSS inside the guest get the system ready and sends the snapshot request down, then this request gets to the storage stack where a .avhdx get created, this is exactly at the moment when the guest snapped, then VSS on the host confirm it’s done, and then it does a host snapshot, this host snapped now includes the (.VHDX / .avhdx), where the VHDX is the data consistent point.
In this new architecture, the whole mount/auto-revert operation gets removed which increases the scalability and reliability of the entire backup process.
The first interesting challenge with this architecture is when VSS on the Host calls the Hyper-V Writer and asks to get the metadata file at the beginning of the backup operation (metadata file is basically a blob of all the bits that you should backup) because those files do not exist yet on the other side. The workaround was to generate VHDX GUIDs at an early stage and send them to the other side in order to make sure the file names are line up.
The big change here is that the action of reverting the virtual hard disk to the data consistent VSS snapshot now takes place inside the virtual machine instead of in the host operating system as they did in Hyper-V 2008 R2 and 2012 (mount/auto-recovery). This has many benefits, one of which is that it scales excellently! It does, however, have one (minor) drawback. In order for this method to work – Hyper-V needs to be able to hot/add and remove virtual hard disks to and from the virtual machine. And this is something that is only supported on the SCSI controller (not in the IDE controller).
If you have noticed when you create a virtual machine in Hyper-V Manager, in System Center Virtual Machine Manager, in Windows Admin Center, or in PowerShell, Hyper-V always created virtual machines with a SCSI controller connected (even if there were no disks attached).
However, if you have manually removed all SCSI controllers from a virtual machine – Hyper-V backup will now fail on that virtual machine, so if you have a PowerShell script that removes all SCSI controllers for Gen 1 VMs, then make sure to add at least one SCSI controller.
The second challenge is if you are running a Hyper-V Cluster with lots of Virtual Machines, in Windows Server 2012 R2 there are two particular problems, the first one is still an architecture problem of scale because in VSS architecture today there is no way to backup virtual machines without triggering a snapshot of your underlying storage!
If you consider this scenario and take 8 nodes Hyper-V Cluster that has 800 virtual machines on it, and when you trigger a backup of all VMs, then Hyper-V will generate 800 snapshots on your storage backend, so you can imagine what will happen with your storage once you start hammering them like that… The workaround was to reduce the number of backup batches that backup vendors have implemented.
The second problem that was raised is around clustering as well because we have another layer that added complexity is the Clustered Shared Volume (CSV) which involved all levels of coordination to make sure that the node that is taking the snapshot is the owner node of that CSV and can do all the coordination around that. Therefore Microsoft released a lot of hotfixes for Windows Server 2012 R2-based failover clusters to make this possible.
Evolving Hyper-V Backup
In Windows Server 2016 Hyper-V and later releases, the architecture is completely changed, Microsoft made pretty significant changes to the backup architecture.
What they did actually, took the middle piece of the Backup architecture from Windows Server 2012 R2 and completely decoupled it from the rest of the system. They gave Hyper-V the support so that anyone (backup partners) can call into Hyper-V WMI and ask for a VSS snapshot of this virtual machine, and then it will go through and do the whole backup process independently.
So in Windows Server 2016 Hyper-V and later release, the backup workflow looks like as shown in Figure 3 below.
As you can see, we have the Hyper-V Host at the bottom line, the Backup App will first call into Hyper-V WMI to get all virtual machines that they want in any backup set ready for backup, and then the Backup App will call into VSS/VDS to orchestrate a single hardware snapshot on the storage backend, the goal here is to get to a model where no matter how many virtual machines, and no matter what scale point are you running.
So if you compare the backup workflow in previous releases, there are two snapshots happening, there is the VMs snapshot and the underlying hardware snapshot, and those two operations are very tight together, so you cannot do one without doing the other.
However, in Windows Server 2016 the Backup App can take as long as it requires to get the set of virtual machines with data consistency, and then do the hardware snapshot as a separate operation, that the key architectural change in Hyper-V 2016.
The second improvement that Microsoft worked on is a new set of technology called Resilient Change Tracking (RCT), the goal of RCT is two, the first one is in all the previous architectures (Figure 1 and Figure 2) the results was a full backup of the virtual hard drives of the virtual machines, what this means is that every time you do a backup (daily, hourly or whatever), the data is sent over the network each and every time and that architecture wouldn’t scale.
In order to avoid sending all the data over the network, in Windows Server 2008 R2 up to Windows Server 2012 R2 every backup partner has implemented a file system filter so they can track the change blocks on the storage, but having a third-party file system filter in the kernel host OS is a potential system crashing bug, and the second issue though, it will affect the storage performance profiling.
So what Microsoft did in Windows Server 2016 was, they built a system where you don’t have to put any file system filter anymore, this was the first motivation, the second motivation though was in the new architecture as shown in Figure 3 above, where the Backup App calls in to snapshot the VMs independently and then take a hardware snapshot as a separate operation, because in all the previous architectures there is an extended period of time where virtual machines will be running on a (.avhdx) files or differencing files, and Microsoft wants to mitigate the performance impact of doing that.
In Windows Server 2016 and later release, Microsoft provided a native change block tracking as part of the platform now. With RCT, they were able to get the block allocation table that exists in every VHDX file and lets you know what block is changed, but they don’t write down the data, because, with the Backup App, you have a copy of the original data somewhere else, thus will avoid copying the data twice. The great thing about RCT infrastructure, it’s tight to the VHDX file, so wherever the file exists, it will stay with it, which is very flexible when it comes to all VM mobility scenarios.
As a side note, two important points to be aware of, if you will be running virtual machines with VHDs in Windows Server 2016 instead of VHDX (please use VHDX), or if your virtual machine is still at version 5.0, you will not take the benefit of RCT support, because a virtual machine with version 5.0 might be taken to a host running Windows Server 2012 R2, and 2012 R2 doesn’t understand RCT, so if you are in either of those situations, you will hit a performance impact during the backup process, because Microsoft will not use RCT in this case, and will use differencing disk instead (old style).
Microsoft will also support backup for guest clusters (groups of virtual machines with shared virtual hard disks) using the RCT infrastructure, but in order to do that Microsoft introduced a new file format called VHDS (VHD Set), the VHDS is a very small file that has a bunch of (.avhdx) files along with it as shown in Figure 4 below.
With the introduction of the VHD Set file, Microsoft can take advantage of the storage snapshot and then lazily update the virtual machine configurations in order to reference the right thing. The VHDS file is a reference/pointer file and includes checkpoint metadata. No user data is stored in the VHDS file.
You can think of the VHDS as an external shared configuration file between the virtual machines (guest clusters), because in Windows Server 2012 R2, if you have two virtual machines using the Shared VHDX file, then each VM has its own configuration file which makes a real problem about metadata update and re-synch all the changes, however in Windows Server 2016 and later release, if you have two virtual machines with their own configuration but depending on Shared VHDS file which is essentially a configuration file that allows us to have one place to update when there are changes to the underlying storage.
The VHD Set file enables solving the problems associated with coordinated updates to all the VM’s configurations by centralizing the VHD file paths in the single VHD Set file.
The VHD Set file also provides a stable file name to use in the UI or PowerShell. This VHD Set file can be used like any other VHD file; it can be queried, migrated, and mounted. The primary reason for VHDS is to have support for checkpoints on guest clusters.
If you manage Hyper-V backup and know the new innovations being enabled in Windows Server 2016 and later for backup applications, which will enable you to make informed decisions on architectures, solutions, and backups.
Many Thanks to Mr. Hyper-V, Ben Armstrong (Group Program Manager of the AKS on-premises team at Microsoft) for this information.
We hope this deep dive article has been informative for you and we would like to thank you for reading!
Make sure to check my recent Windows Server Hyper-V Cookbook for in-depth details about Hyper-V!
Thank you for reading my blog.
If you have any questions or feedback, please leave a comment.