How to reduce DPM 2016 storage consumption by enabling deduplication on Modern Backup Storage?

Updated 29/11/2016: [Script updated to create simple storage pool for DPM.]

Updated 09/02/2017: [Announcing backups of SQL Server 2016 and SharePoint 2016 with DPM 2016 Update Rollup 2. DPMDB can also be hosted on SQL Server 2016].

Two weeks ago, Microsoft released Windows Server 2016 and System Center 2016 for public (GA). In the previous post, I covered How to install DPM 2016 on Windows Server 2016 and SQL Server 2016. Please make sure to check the previous article to get an overview of what’s new in DPM 2016. Now in this post, I will describe how to reduce DPM 2016 storage consumption by enabling data deduplication on DPM Modern Backup Storage (MBS).

Introduction to DPM Modern Backup Storage (MBS)

In System Center Data Protection Manager 2016, Microsoft announced a new backup option called Modern Backup Storage (MBS). MBS helps achieving 50% storage savings using ReFS technology, and 3 times faster backup using ReFS block cloning, which uses Allocate-on-write, as opposed to copy-on-write used by Volume Snapshot in DPM 2012 R2. Modern Backup Storage (MBS) helps also achieving much more efficient storage utilization using Workload Aware Storage. Workload Aware Storage enables you to configure DPM to store your backups on a high or low performant volume based on the workloads. DPM 2016 uses by default Modern Backup Storage, please note that you can still use the old disks storage technology used by DPM 2012/R2, but I advise you to start using the new Modern Backup Storage.

How does DPM MBS Works

DPM leverages Windows Server 2016 ReFS capabilities to provide Modern Backup Storage. When you add a volume, DPM formats the storage into an ReFS volume and store the backups on multiple VHDXs, each VHDX is 1.2GB in size. Let’s suppose you are backing up a SQL database with 10 blocks, DPM will place the VHDX into a common chunk store on the ReFS storage volume. On the next recovery point, DPM creates an ReFS clone pointing to the original VHDX and the common chuck store as well. When some of the blocks are changed for the backup, DPM transfers the new blocks and write them into the cloned VHDX using Allocate-on-write technology, then ReFS writes the new blocks into the chunk store and the new clone VHDX will point to these blocks of the new data.  

Data Deduplication Overview

Data Deduplication, often called Dedup for short, is a feature introduced in Windows Server 2012/R2 and enhanced in Windows Server 2016. Data Deduplication helps reduce redundant data on storage costs. When enabled, Data Deduplication optimizes free space on a volume by examining the data on the volume by looking for duplicated portions on the volume. Data Deduplication optimizes redundancies without compromising data fidelity or integrity. A quick overview about the Data Deduplication new enhancements in Window Server 2016 is the following:

a. You can use now larger volumes up to 64TB, as opposed to 10TB in Windows Server 2012 R2.
b. Windows Server 2016 runs multiple threads in parallel using multiple I/O queues on a single volume, resulting in increase performance, as opposed to single-threaded job and I/O queue for each volume in Windows Server 2012 R2.
c. You can use file sizes up to 1TB, those file are good candidate for dedup.
d. Virtualized backup aware using the new “Backup” usage type to configure dedup on DPM storage which is our focus in today’s article.

DPM and Dedup

Using dedup with DPM can result in large savings on the storage backend. The amount of space saved by dedup can vary depending on the type of data being backed up (i.e. SQL database, virtual machines, virtual desktop environments, etc.) The large amount of saving is when you backup VMs, the storage saving will vary between 50% to 90%.

As discussed earlier, Modern Backup Storage (MBS) relies on ReFS technology, however, Microsoft do not support data deduplication in Window Server 2016 with ReFS volumes. Dedup is also not supported on Storage Spaces Direct (S2D), because S2D relies on ReFS technology. The following screenshot shows “Configure Data Deduplication” is disabled for ReFS volume.

DPM2016-Dedup-MDS02

How to configure data deduplication for DPM Modern Backup Storage

In this example, I have DPM 2016 running on Windows Server 2016 in a virtual machine on Windows Server 2016 Hyper-V host and stores backup data to VHDXs. Those VHDXs are stored locally on a Hyper-V host under a separate NTFS Parity Volume.

Please note that you can also store the VHDXs files to a shared folders on SMB 3.1 Scale-out File Server (SOFS) with Storage Spaces and data deduplication enabled.

The Clustered Shared Volume (CSV) on the Scale-out File Server must be formatted with NTFS based file system in order to enable data deduplication.

Step 1: Set up dedup on NTFS volumes

First things first, you need to make sure you have installed Data deduplication feature and rebooted your host.

DPM2016-Dedup-MDS03

Since in this example we are using local NTFS volume and not Storage Spaces, we will start by formatting the volume with 64 KB allocation units and large NTFS File Record Segment (FRS) to work better with dedup.

Open Windows PowerShell and run the following command:

DPM2016-Dedup-MDS04

In this example, I am using 9 TB NTFS volume. But you can go ahead and use larger volumes as needed, up to 64 TB.

Step 2: Enable dedup on NTFS volumes

In the next step, we need to enable dedup for each NTFS volume that will store the VHDXs. In Windows Server 2016, Microsoft introduced a new usage type called “Backup” which minimizes the number of PowerShell commands used, we can enable dedup for Backup target with just one command:

DPM2016-Dedup-MDS05

Step 3: Prepare DPM Storage

DPM 2016 is allocating the storage using VHDX files residing on the deduplicated volume, in this example it’s 9 TB. We will create 12 dynamic VHDX files with 1TB of size on the dedup volume and then attach them to DPM. Note that 3TB of overprovisioning of storage is done to take advantage of the storage savings produced by dedup. As dedup produces additional storage savings, new VHDX files can be created on the same volume to consume saved space. In SC DPM 2016, Microsoft tested the DPM server with up to 80X1TB VHDXs files attached to it. However, this is not a limit, some users have reported they have tested up to 150X1TB VHDXs attached to a single DPM server. But please remember to spread the VHDXs across multiple virtual SCSI controllers of the DPM-VM so that the VM will be performant.

A quick reminder: How many SCSI controllers can you add to a VM? The maximum is 4 SCSI controllers, and how many VHDX each controller can support? 64 virtual hard disks per controller, so the total will be 256 VHDXs. It would be nice to test 255X1TB VHDX with DPM storage and reserve one disk for the OS.

Open Windows PowerShell and run the following commands to create 12 virtual hard disks, shutdown the DPM server, add 3 SCSI controllers, and then add the created virtual hard disks to the DPM server (In this example, 4 VHDX files will be attached to each SCSI controller).

DPM2016-Dedup-MDS17

Step 4: Configure Storage and Enable Modern Backup Storage

The following steps will be performed inside the DPM virtual machine, we will create first a Simple Storage Space that will aggregate all the 12X1TB disk and then add the volume to DPM 2016 as Modern Backup Storage.

You may wonder if Simple Storage Space is a single point of failure in the guest? Please remember that those VHDXs are residing on a Scale-out File Server (SoFS) or on Storage Spaces Direct (S2D), you can maintain the resiliency at the storage level, but not inside the guest as it is not supported by storage spaces. Also Simple Storage Space is suggested inside the guest so that you can extend DPM volume(s) if needed in the future. In this example, I am using a parity storage on the host level.

Login to DPM server and run the following commands:

Here is the result in Server Manager:

dpm2016-dedup-mds21

Updated 29/11/2016: One important point to mention here,  in the script above, I created the simple storage space virtual disk with Number of Columns equal to 1, because when a virtual disk is created without specifying the NumberOfColumns, storage spaces will set the NumberOfColumns property automatically according to the number of physical disks used to create the virtual disk and cannot be changed after the virtual disk has been created. If for example, the NumberOfColumns is 8, you can only extend the simple fixed virtual disk is to add 8 more disks to the storage pool!!! so for this reason, I specified the NumberofColumns equal to 1, so we can add 1 disk at a time in the future as needed.

Next, open DPM console and browse to Management | Disk Storage and then click on +Add as shown in the following screenshot.

DPM2016-Dedup-MDS06

By default DPM will format the volume with ReFS and add it to the Storage Pool to be able to use all the new savings with Modern Backup Storage. You can also give the volume a friendly name for easy recall later as shown in the next screenshot.

DPM2016-Dedup-MDS07

Please note that if you are upgrading from DPM 2012 R2 and you have protection groups created with that version, you will also see an option to “Add disks” to be used for those protection groups as shown in above screenshot.

After adding the volume to DPM, the next step is to configure Workload Aware Storage. This feature enables you to associate workloads with volumes, so when you configure protection groups, DPM will proactively select these volumes to store the associated workloads. In this example, I need to backup Hyper-V virtual machines, this can be easily done with PowerShell as shown in the next screenshot. I will specify the DataSourceType as Hyper-V.

DPM2016-Dedup-MDS08

Please note that you can also associate volumes to FileSystem, Client, Exchange, SharePoint, SQL, VMware, All, SystemProtection, Hyper-V and Other workloads.

Step 5: Configure Protection Group with MBS

In the final step, we will create a protection group and start backing up our virtual machines using Modern Backup Storage and get the benefits of storage savings using both technologies (MBS and Dedup).

Open DPM console and browse to Protection then click on New. In the create protection wizard, simply select Servers, then select all virtual machines you want to protect. In this example, I am backing up a Hyper-V cluster as shown in the following screenshot.

DPM2016-Dedup-MDS10

Then select the protection method you need (short-term to disk, to Azure or long-term using tape), then give the protection group a name. Click Next.

DPM2016-Dedup-MDS11

Select the short-term retention goal you need and then specify the recovery point schedule. In this example it’s scheduled everyday at 6.00 PM.

DPM2016-Dedup-MDS12

The next step is to review the disk storage allocation, here you can see the Total data size I have which is 2.5TB, the Disk storage to be provisioned on DPM=5TB, you can review the Data size for each VM and the space to be provisioned in DPM storage.

Since we have configured Hyper-V Volume as DataSourceType to store the virtual machines that we need to backup, DPM will select this particular volume as the target storage automatically. However, you can also change the target storage for a particular data source and back it up on some other volumes if needed, you can do this by selecting the drop down box as shown in the following screenshot.

DPM2016-Dedup-MDS13

Click Next and then select the replica creation method. I will select Now to happen automatically over the network.

DPM2016-Dedup-MDS14

Click Next and make sure to select Run a consistency check if a replica becomes inconsistent.Click Next, and then click on Create Group as shown in the next screenshot to start backing up your workload using Modern Backup Storage.

DPM2016-Dedup-MDS16

Step 6: Optimize DPM backup and dedup scheduling

Please be aware that backup including consistent check and deduplication operations are I/O intensive. If they were to run at the same time, additional overhead to switch between these operations could be costly and result in less data being backed up or deduplicated on a daily basis. Microsoft recommends to configure a separate deduplication and backup/consistent schedule windows.

In this case, you can set up the DPM schedule into non-overlapping backup and dedup windows. In order to do so,  open DPM Management Shell inside the guest and run the following command:

In this example, DPM is configured to backup virtual machines (workloads) between 4:00 PM and 4:00 AM.

The next step is to configure deduplication to run on the host for the remaining 12 hours of the day which is from 4:00 AM to 4:00 PM.

A 12 hour deduplication window starting at 4.00 AM after the backup window ends would be configured as follows from any individual Hyper-V host or cluster node:

The output will look something like this:

It’s very important to note that whenever the DPM backup window is modified, it’s vital that you modify the deduplication window along with it so they don’t overlap.

Conclusion

DPM Modern Backup Storage (MBS) and dedup are better together. With MBS you can save up to 50% of storage and with dedup you can also save up 50%-90%.

Here is the results of storage saving after two weeks of running DPM 2016 with Modern Backup Storage and dedup enabled on NTFS volume in Windows Server 2016.

I hope this post has been informative for you, and I would like to thank you for reading!

Cheers,
-Ch@rbel

About Charbel Nemnom 295 Articles
Charbel Nemnom is a Microsoft Cloud Consultant and Technical Evangelist, totally fan of the latest's IT platform solutions, accomplished hands-on technical professional with over 15 years of broad IT Infrastructure experience serving on and guiding technical teams to optimize performance of mission-critical enterprise systems. Excellent communicator adept at identifying business needs and bridging the gap between functional groups and technology to foster targeted and innovative IT project development. Well respected by peers through demonstrating passion for technology and performance improvement. Extensive practical knowledge of complex systems builds, network design and virtualization.

2 Comments

  1. When creating the Storage Pool and Virtual disk in step 4, is there a reason that you format it as NTFS? Or is it just because it’s simple and DPM is going to re-format it as ReFS anyway?

Leave a Reply