Business continuity in today’s world is becoming more critical than ever. With the increasing amounts of threats and disasters in this all-connected world comes more critical systems that need to be kept up and running 24×7 no matter what the circumstances are and where the workload is running. According to various reports, business continuity and disaster recovery (BCDR) are listed as one of the top 5 priorities businesses continue to have in today’s world.
In this blog post, I will share with you how to enable Azure Site Recovery (ASR) on VMs using Azure Policy to auto-protect and replicate your virtual machines to another region for disaster recovery.
Azure Site Recovery (ASR) provides a single DR solution that works across platforms running on Hyper-V environment on-premises, Azure Stack Hub, Azure Stack HCI, VMware virtualization platform, or even on your physical platform works across platforms and clouds as well. So, it could be your public cloud, your private cloud, or service provider cloud, and across different workloads as well.
Azure Site Recovery is a native service that helps minimize downtime for Azure and hybrid workloads to support business continuity with near-continuous replication and point-in-time recovery offering secure and cost-effective crash-consistent and application-consistent recovery.
Microsoft recently announced the public preview of Azure Policy support for Azure Site Recovery (ASR). This basically lets you create a policy for your resource group and as you add any VM to that resource group, the policy ensures that the VM gets protected by ASR and enables the replication to another Azure region.
Azure Policy is basically a set of rules or a particular workflow which we would like all the resources of a particular resource group, subscription, or management group to follow, it can be as simple as assigning a “Tag” to any resource that is created within the resource group.
In this article, I will show you how to enable Azure Policy to support Azure Site Recovery. This will ensure that disaster recovery of virtual machines is automatically enabled as soon as a VM is created in a specific resource group.
To follow this article, you need to have the following:
- An Azure subscription. If you don’t have an Azure subscription, you can create a free one here.
- Azure Resource Group (RG) obviously.
- At least one virtual network(s) is deployed with their appropriate IP subnets in the target region. Please check the following quickstart guide to create a virtual network.
- At least one Azure virtual machine is deployed in the source region and in the desired RG. Please check the following quickstart guide to create a Windows virtual machine.
Assuming you have all the prerequisites in place, take now the following steps:
Enable ASR with Azure Policy
Open the Azure Portal, click “All services” and then search for “Policy” and then click on “Assignments” → “Assign policy”.
The Assign policy blade will open which lets you author a policy.
The first step is to define the scope where the policy applies. A scope can be a management group, subscription, or resource group. But for Azure Site Recovery, the scope is limited to resource group only. As shown in the figure below, I will set the scope to a resource group that I have created (You select the subscription first, and then the desired resource group).
Next, click on the ellipsis (…) next to the Policy definition. On the Policy definition blade, search for ‘disaster‘. The first policy, you can ‘Audit virtual machines without disaster recovery configured‘, so this is basically monitoring or reporting policy that lets you know which VM in your resource group or subscription does not have disaster recovery configured. As shown in the figure below, I will select ‘Configure disaster recovery on virtual machines by enabling replication‘.
You can give it any custom assignment name and description you want. In this example, I will call it ‘ASR Policy for West Europe VMs‘, and give it a description ‘I am creating an Azure Policy for all my West Europe VMs‘. By default, the ‘Policy enforcement‘ is set to Enabled. Click Next to continue.
On the Parameters tab, you need to specify several parameters for the ASR policy as follows:
- Source Region: The source region basically defines that which VM needs to be protected. So if I choose ‘West Europe’, then all the VM’s that are added to my resource group which resides in the ‘West Europe’ region will get protected, and any VM which gets created that resides in another region, will not get protected.
- Target Region: The target region is where your VM will get replicated. I will set it to ‘North Europe’ in this example.
- Target Resource Group: You need to configure a target resource group. The good news is, ASR will create a new RG if you want to use it. In this case, ASR will create a new resource group for you, appending hyphen ‘ASR’ to it.
- Vault Resource Group: Next thing is to pick up a resource group for the vault which would be housing all these VMs. As you might be aware, we have a logical construct called Recovery Services Vault which houses all the protected items of ASR. This RG is different from the target resource group above.
- Recovery Services Vault: Next, select the Recovery Services Vault that resides in the RG. If you don’t have a Recovery Services Vault, then the policy will create a new one for you.
- Recovery Virtual Network: The next thing is we need to choose a recovery virtual network in the target region. I’ve done that already, but if you don’t have a VNet created, then the policy will create a new one for you in the target region. But I don’t recommend using this option since you need to take control of the address space and define your virtual subnets appropriately.
- Target Availability Zone: Then you sent the availability zone in the designated target region to be used by virtual machines during a disaster. In this example, I will set it to ‘2‘. In this case, for any VM of my ‘West Europe’ region, on failover, the VM will be created in target availability zone ‘2‘ of ‘North Europe’. Now you might think, what if you want to use this policy for a zone-to-zone scenario? You can do that too. You could have chosen the same target region as ‘West Europe’ instead of ‘North Europe’, and of course, this depends if the same Azure region you have chosen does support zone-to-zone capability. This would ensure that the settings configured are for a zone-to-zone scenario.
- Effect: The default is set to ‘DeployIfNotExists’, you can set it to ‘Disabled’ to disable this policy if you want.
As shown in the figure below, I will use ‘West Europe’ and ‘North Europe’ for my disaster recovery scenario.
Click Next to continue to the Remediation tab. So what remediation means is, by default as shown in the figure below, the assignment only takes effect for the newly created resources. The existing resources can be updated via a remediation task after the policy is assigned. What this means is if there are any VMs that are already present in my resource group, I can create a remediation task and this would ensure that all those existing VMs, the policy enforced on them as well.
The next thing that we have is to create a Managed identity, it is advised that managed identity is to be created in a region which is different than your source region because in scenarios where a disaster strikes, your source region, and even more Identity will go down. In this example, I would pick ‘North Europe’ which was my target region as well as managed identity location.
The purpose of this managed identity is to create resources and to install the agent inside your VM. So what we are doing here, is basically creating a managed identity and this identity has access to your resource group. By doing this, all the creation of resources and the installation of agents in your VM’s will be automatically managed by ASR.
Now clicking on Next, you can set up a custom non-compliance message. This will reflect in case any of your VM is not protected by the policy for whatsoever reason. So for example, I can set up a message (Non-compliant VM for Disaster Recovery. Please take a look.).
Click Next to continue. Finally, you can validate all the entries that you have entered and then click on ‘Create‘.
Once the assignment is created, it will take around 30 minutes to take effect.
Verify Azure Site Recovery enforcement
In this section, we will verify how the policy enforcement looks after it’s enforced.
I already have a couple of VMs which I have created using this policy which is assigned on the scope of my resource group.
Navigate to the resource group where the VMs are created, and then click on ‘Policies‘ on the left blade under Settings as shown in the figure below.
You can see all the policies that are existing in this resource group including the policy that I created in the previous section ‘ASR Policy for West Europe VMs‘.
If you click on the policy, you will see all the VMs which are Compliant and Non-compliant state. This is basically the monitoring pane for the policy. You can see that all my VMs are in a compliant state, it is in ‘West Europe’ and is protected by Azure Site Recovery (ASR) as shown in the figure below.
You can select any compliant VM and then click on ‘View Resource‘. The VM blade will open where you can navigate to the ‘Disaster recovery ‘ blade under Operations on the left-hand side. You can see that the VM is protected and in a healthy state as shown in the figure below.
However, If any VM is in a Non-compliant state, you can click on the Details under Compliance reason and look for the reason. One of the reasons for the non-compliance message is, “No related resources match the effect details in the policy definition“. That means there were some criteria which are missed.
At the time of this writing, Azure Policy for Azure Site Recovery does not support Proximity Placement Group (PPG) and Azure availability set for virtual machines. Proximity placement groups allow you to group Azure resources physically closer together in the same region, and an availability set is a logical grouping of VMs that allows Azure to understand how your application is built to provide for redundancy and availability. I hope the policy will be updated in the near future to take into account those deployment options as well.
What if you have existing VMs deployed in that resource group before you assign this policy? Remember the assignment only takes effect for the newly created resources. The existing resources can be updated via a remediation task after the policy is assigned.
So during an evaluation cycle, the policy definition with a “DeployIfNotExists” effect that matches resources, is marked as non-compliant only, but no action is taken on the existing VMs. For this reason, you need to create a remediation task when you assign the policy definition as described in the previous section.
Then you can click on ‘Create Remediation Task‘ from the compliance page of that policy and then click on ‘Remediate‘ which will auto-protect the VM with ASR based on the parameters set for this policy.
That’s it there you have it!
In this article, I showed you how to enable Azure Site Recovery (ASR) on VMs using Azure Policy to auto-protect and replicate your virtual machines to another region for disaster recovery. This will ensure that the policies and procedures of your organization are compliant with the cloud deployment of your application to meet the business Service Level Agreement (SLA).
Azure Site Recovery is complete disaster recovery as a service solution that is part of a broad service presented through Azure Recovery Services Vaults.
Though cloud-native and platform as a service (PaaS), it is also possible to use Site Recovery from one Azure region to another Azure region (cross-region replication), as well as between availability sets within the same datacenter, and between availability zones within the same region for truly DR solution in the cloud. Azure Site Recovery is more than capable of replacing your secondary disaster recovery site with a cloud-based solution that is reliable, secure, and cost-competitive.
Combining Azure Site Recovery with Azure Backup can be a major component of any Business Continuity and Disaster Recovery (BCDR) plan while minimizing cost and investments.
Thank you for reading my blog.
If you have any questions or feedback, please leave a comment.