How to reduce DPM 2016 storage consumption by enabling deduplication on Modern Backup Storage?

Updated 29/11/2016: [Script updated to create simple storage pool for DPM.]

Updated 09/02/2017: [Announcing backups of SQL Server 2016 and SharePoint 2016 with DPM 2016 Update Rollup 2. DPMDB can also be hosted on SQL Server 2016].

Updated 10/07/2017: [Over the last couple of months, I received several comments about DPM and Dedup. I updated this post by answering all those questions. Please see the Frequently asked questions (FAQ) section at the end of this post].

Two weeks ago, Microsoft released Windows Server 2016 and System Center 2016 for public (GA). In the previous post, I covered How to install DPM 2016 on Windows Server 2016 and SQL Server 2016. Please make sure to check the previous article to get an overview of what’s new in DPM 2016. Now in this post, I will describe how to reduce DPM 2016 storage consumption by enabling data deduplication on DPM Modern Backup Storage (MBS).

Introduction to DPM Modern Backup Storage (MBS)

In System Center Data Protection Manager 2016, Microsoft announced a new backup option called Modern Backup Storage (MBS). MBS helps achieving 50% storage savings using ReFS technology, and 3 times faster backup using ReFS block cloning, which uses Allocate-on-write, as opposed to copy-on-write used by Volume Snapshot in DPM 2012 R2. Modern Backup Storage (MBS) helps also achieving much more efficient storage utilization using Workload Aware Storage. Workload Aware Storage enables you to configure DPM to store your backups on a high or low performant volume based on the workloads. DPM 2016 uses by default Modern Backup Storage, please note that you can still use the old disks storage technology used by DPM 2012/R2, but I advise you to start using the new Modern Backup Storage.

How does DPM MBS Works

DPM leverages Windows Server 2016 ReFS capabilities to provide Modern Backup Storage. When you add a volume, DPM formats the storage into an ReFS volume and store the backups on multiple VHDXs, each VHDX is 1.2GB in size. Let’s suppose you are backing up a SQL database with 10 blocks, DPM will place the VHDX into a common chunk store on the ReFS storage volume. On the next recovery point, DPM creates an ReFS clone pointing to the original VHDX and the common chuck store as well. When some of the blocks are changed for the backup, DPM transfers the new blocks and write them into the cloned VHDX using Allocate-on-write technology, then ReFS writes the new blocks into the chunk store and the new clone VHDX will point to these blocks of the new data.  

Data Deduplication Overview

Data Deduplication, often called Dedup for short, is a feature introduced in Windows Server 2012/R2 and enhanced in Windows Server 2016. Data Deduplication helps reduce redundant data on storage costs. When enabled, Data Deduplication optimizes free space on a volume by examining the data on the volume by looking for duplicated portions on the volume. Data Deduplication optimizes redundancies without compromising data fidelity or integrity. A quick overview about the Data Deduplication new enhancements in Window Server 2016 is the following:

a. You can use now larger volumes up to 64TB, as opposed to 10TB in Windows Server 2012 R2.
b. Windows Server 2016 runs multiple threads in parallel using multiple I/O queues on a single volume, resulting in increase performance, as opposed to single-threaded job and I/O queue for each volume in Windows Server 2012 R2.
c. You can use file sizes up to 1TB, those file are good candidate for dedup.
d. Virtualized backup aware using the new “Backup” usage type to configure dedup on DPM storage which is our focus in today’s article.

DPM and Dedup

Using dedup with DPM can result in large savings on the storage backend. The amount of space saved by dedup can vary depending on the type of data being backed up (i.e. SQL database, virtual machines, virtual desktop environments, etc.) The large amount of saving is when you backup VMs, the storage saving will vary between 50% to 90%.

As discussed earlier, Modern Backup Storage (MBS) relies on ReFS technology, however, Microsoft do not support data deduplication in Window Server 2016 with ReFS volumes. Dedup is also not supported on Storage Spaces Direct (S2D), because S2D relies on ReFS technology. The following screenshot shows “Configure Data Deduplication” is disabled for ReFS volume.

DPM2016-Dedup-MDS02

How to configure data deduplication for DPM Modern Backup Storage

In this example, I have DPM 2016 running on Windows Server 2016 in a virtual machine on Windows Server 2016 Hyper-V host and stores backup data to VHDXs. Those VHDXs are stored locally on a Hyper-V host under a separate NTFS Parity Volume.

Please note that you can also store the VHDXs files to a shared folders on SMB 3.1 Scale-out File Server (SOFS) with Storage Spaces and data deduplication enabled.

The Clustered Shared Volume (CSV) on the Scale-out File Server must be formatted with NTFS based file system in order to enable data deduplication.

Step 1: Set up dedup on NTFS volumes

First things first, you need to make sure you have installed Data deduplication feature and rebooted your host.

DPM2016-Dedup-MDS03

Since in this example we are using local NTFS volume and not Storage Spaces, we will start by formatting the volume with 64 KB allocation units and large NTFS File Record Segment (FRS) to work better with dedup.

Open Windows PowerShell and run the following command:

DPM2016-Dedup-MDS04

In this example, I am using 9 TB NTFS volume. But you can go ahead and use larger volumes as needed, up to 64 TB.

Step 2: Enable dedup on NTFS volumes

In the next step, we need to enable dedup for each NTFS volume that will store the VHDXs. In Windows Server 2016, Microsoft introduced a new usage type called “Backup” which minimizes the number of PowerShell commands used, we can enable dedup for Backup target with just one command:

DPM2016-Dedup-MDS05

Step 3: Prepare DPM Storage

DPM 2016 is allocating the storage using VHDX files residing on the deduplicated volume, in this example it’s 9 TB. We will create 12 dynamic VHDX files with 1TB of size on the dedup volume and then attach them to DPM. Note that 3TB of overprovisioning of storage is done to take advantage of the storage savings produced by dedup. As dedup produces additional storage savings, new VHDX files can be created on the same volume to consume saved space. In SC DPM 2016, Microsoft tested the DPM server with up to 80X1TB VHDXs files attached to it. However, this is not a limit, some users have reported they have tested up to 150X1TB VHDXs attached to a single DPM server. But please remember to spread the VHDXs across multiple virtual SCSI controllers of the DPM-VM so that the VM will be performant.

A quick reminder: How many SCSI controllers can you add to a VM? The maximum is 4 SCSI controllers, and how many VHDX each controller can support? 64 virtual hard disks per controller, so the total will be 256 VHDXs. It would be nice to test 255X1TB VHDX with DPM storage and reserve one disk for the OS.

Open Windows PowerShell and run the following commands to create 12 virtual hard disks, shutdown the DPM server, add 3 SCSI controllers, and then add the created virtual hard disks to the DPM server (In this example, 4 VHDX files will be attached to each SCSI controller).

DPM2016-Dedup-MDS17

Step 4: Configure Storage and Enable Modern Backup Storage

The following steps will be performed inside the DPM virtual machine, we will create first a Simple Storage Space that will aggregate all the 12X1TB disk and then add the volume to DPM 2016 as Modern Backup Storage.

You may wonder if Simple Storage Space is a single point of failure in the guest? Please remember that those VHDXs are residing on a Scale-out File Server (SoFS) or on Storage Spaces Direct (S2D), you can maintain the resiliency at the storage level, but not inside the guest as it is not supported by storage spaces. Also Simple Storage Space is suggested inside the guest so that you can extend DPM volume(s) if needed in the future. In this example, I am using a parity storage on the host level.

Login to DPM server and run the following commands:

Here is the result in Server Manager:

dpm2016-dedup-mds21

Updated 29/11/2016: One important point to mention here,  in the script above, I created the simple storage space virtual disk with Number of Columns equal to 1, because when a virtual disk is created without specifying the NumberOfColumns, storage spaces will set the NumberOfColumns property automatically according to the number of physical disks used to create the virtual disk and cannot be changed after the virtual disk is been created. If for example, the NumberOfColumns is 8, you can only extend the simple fixed virtual disk is to add 8 X 1TB VHDXs to the storage pool!!! so for this reason, I specified the NumberofColumns equal to 1, so we can add 1 disk at a time in the future as needed.

Next, open DPM console and browse to Management | Disk Storage and then click on +Add as shown in the following screenshot.

DPM2016-Dedup-MDS06

By default DPM will format the volume with ReFS and add it to the Storage Pool to be able to use all the new savings with Modern Backup Storage. You can also give the volume a friendly name for easy recall later as shown in the next screenshot.

DPM2016-Dedup-MDS07

Please note that if you are upgrading from DPM 2012 R2 and you have protection groups created with that version, you will also see an option to “Add disks” to be used for those protection groups as shown in above screenshot.

After adding the volume to DPM, the next step is to configure Workload Aware Storage. This feature enables you to associate workloads with volumes, so when you configure protection groups, DPM will proactively select these volumes to store the associated workloads. In this example, I need to backup Hyper-V virtual machines, this can be easily done with PowerShell as shown in the next screenshot. I will specify the DataSourceType as Hyper-V.

DPM2016-Dedup-MDS08

Please note that you can also associate volumes to FileSystem, Client, Exchange, SharePoint, SQL, VMware, All, SystemProtection, Hyper-V and Other workloads.

Step 5: Configure Protection Group with MBS

In the final step, we will create a protection group and start backing up our virtual machines using Modern Backup Storage and get the benefits of storage savings using both technologies (MBS and Dedup).

Open DPM console and browse to Protection then click on New. In the create protection wizard, simply select Servers, then select all virtual machines you want to protect. In this example, I am backing up a Hyper-V cluster as shown in the following screenshot.

DPM2016-Dedup-MDS10

Then select the protection method you need (short-term to disk, to Azure or long-term using tape), then give the protection group a name. Click Next.

DPM2016-Dedup-MDS11

Select the short-term retention goal you need and then specify the recovery point schedule. In this example it’s scheduled everyday at 6.00 PM.

DPM2016-Dedup-MDS12

The next step is to review the disk storage allocation, here you can see the Total data size I have which is 2.5TB, the Disk storage to be provisioned on DPM=5TB, you can review the Data size for each VM and the space to be provisioned in DPM storage.

Since we have configured Hyper-V Volume as DataSourceType to store the virtual machines that we need to backup, DPM will select this particular volume as the target storage automatically. However, you can also change the target storage for a particular data source and back it up on some other volumes if needed, you can do this by selecting the drop down box as shown in the following screenshot.

DPM2016-Dedup-MDS13

Click Next and then select the replica creation method. I will select Now to happen automatically over the network.

DPM2016-Dedup-MDS14

Click Next and make sure to select Run a consistency check if a replica becomes inconsistent.Click Next, and then click on Create Group as shown in the next screenshot to start backing up your workload using Modern Backup Storage.

DPM2016-Dedup-MDS16

Step 6: Optimize DPM backup and Dedup scheduling

Please be aware that backup including DPM Consistent Check (CC) and deduplication operations are I/O intensive. If they were to run at the same time, additional overhead to switch between these operations could be costly and result in less data being backed up or deduplicated on a daily basis. Microsoft recommends to configure a separate deduplication and backup/consistent schedule Window.

In this case, you can set up DPM backup schedule window and  DPM Consistent Check (CC) window into non-overlapping Dedup Window. In order to do so, open DPM Management Shell inside the guest and run the following command:

In this example, DPM is configured to backup virtual machines and run DPM Consistent Check (CC) window between 4:00 PM and 4:00 AM.

The final step is to configure deduplication to run on the host for the remaining 12 hours of the day which is from 4:00 AM to 4:00 PM.

A 12 hour deduplication Window starting at 4.00 AM after backup schedule / consistent check Window ends would be configured as follows from any individual Hyper-V host or SOFS/S2D cluster node:

The output will look something like this:

It’s very important to note that whenever the DPM backup Window is modified, it’s vital that you modify the deduplication Wwindow along with it so they don’t overlap.

Conclusion

DPM Modern Backup Storage (MBS) and Dedup are better together. With MBS you can save up to 50% of storage and with Dedup you can also save up 50%-90%.

Here is the results of storage saving after two weeks of running DPM 2016 with Modern Backup Storage and Dedup enabled on NTFS volume in Windows Server 2016.

I hope this post has been informative for you, and I would like to thank you for reading!

Cheers,
-Ch@rbel

FAQ

Q: Is Dedup on Microsoft Azure Backup Server (MABS) works the same as described in this post with DPM?
A: Yes, Dedup with MABS is fully supported using the same steps described in this post. MABS inherits the same functionality of DPM.

Q: Is it require to store the 1 TB VHDX files on a Scale-out File Server (SOFS) or Storage Spaces Direct (S2D)?
A: No, you can just enable Dedup on the Hyper-V host’s drive that hold the VHDXs for MABS/DPM server. The VHDXs can be stored locally on the Hyper-V host with NTFS volume and Dedup enabled. In fact, you can even leverage your exiting NAS/SAN with iSCSI or FC and map the volume to the Hyper-V host. But please note that in this option, you will loose VM mobility since the storage is mapped directly into the Hyper-V host. You can also use Hyper-V cluster with traditional SAN storage and place those VHDXs files on a Cluster Shared Volume (CSV) with NTFS and Dedup enabled.

Q: Does the volume/drive have to be dedicated to DPM/MABS storage? What if it holds other VM VHDXs?
A: No, it’s not required to have a dedicated drive for DPM/MABS storage. You can put other VM VHDXs as well, but please note the following two caveats that you want take into consideration when planning your backup storage: a) Dedup on NTFS with Windows Server 2012 R2 and Windows Server 2016 supports only: General Purpose File Server, Virtual Desktop Infrastructure (VDI) workloads and Virtualized Backup Applications such as (DPM/MABS). Dedup does not support General Purpose VM VHDXs.
b) If you share the dedicated drive for other use case, this will impact the backup performance and it will increase the IO on the disk. Plan accordingly.

Q: What is the recommended or optimal size for the VHDXs that get attached to the DPM/MABS VM?
A: Microsoft supports up 1 TB large files for Data Deduplication. Windows Server 2016 helps you scale up, with full support for 64TB volumes and full use of up to 1TB files. For this reason, I created several 1TB VHDXs files on a single NTFS volume.

Q: Can I set the ‘NumberOfColumns’ property through a GUI-initiated Storage Spaces Virtual Disk?
A: No, you can only set the ‘NumberOfColumns’ property using Windows PowerShell while creating the virtual disk as described in this article. Please remember to set the ‘NumberOfColumns’ equal to 1, so you can add 1 or more VHDXs at a time in the future as needed.

Q: If you have multiple VHDXs attached to the DPM/MABS server, couldn’t you just select a single one, create the Storage Space Virtual Disk, and then be able to expand it with the additional disks post-creation?
A: Yes, you can do that. But remember that a single VHDX can support maximum up to 1TB of data, therefore the Storage Space Virtual Disk will be 1TB. So you need to be quick and add additional disks post-creation before DPM storage runs out of disk space.

Q: Can you elaborate on DPM Consistency Check (CC) schedule, Dedup Garbage Collection, and Dedup Scrubbing optimization. What are the defaults, and what are the best practices around these components?
A: As mentioned in this article, backup jobs including Consistent Check (CC) and deduplication operations are I/O intensive. If they were to run at the same time, additional overhead to switch between these operations could be costly and result in less data being backed up or deduplicated on a daily basis. In this case, you can set up DPM/CC schedule Window into non-overlapping Dedup Window. Please refer to Step 6: Optimize DPM backup and Dedup scheduling for more information.

Q: For the VM that’s running DPM, it’s a Generation 1 or Generation 2 Hyper-V VM?
A: The VM that running DPM is a Generation 2 Hyper-V VM.

Q: In the article, I used a separate NTFS Parity volume to hold the VHDXs that are attached to the DPM server. Can you explain what do you mean by “Parity” term? Could it just be a dedicated LUN attached to the Hyper-V host?
A: Parity is a Resilient Setting used by Storage Spaces like RAID-5 or RAID-6. You could have a single Hyper-V host with local Direct Attached Storage (DAS) and then create Storage Spaces with “Parity” virtual disk. The Parity setting is storage efficient than Mirror but less performant. The Parity volume is recommended for Backup and Archive. You could also use a dedicated LUN attached to the Hyper-V host with RAID-5 or RAID-6 volume.

Q: How does the Protection Group short-term goal’s scheduled backup time relate to the DPM Backup Schedule mentioned in Step 6? In that step all Protection Group’s DPM Backup Window are set to the same time and duration. Why is that? You also set the Consistency Check Window to the same as well, why?
A: I’ve set DPM Backup Window and Consistency Check (CC) Window into non-overlap with Dedup Window on the Hyper-V host. Because if they were to run at the same time, additional I/O overhead will impact the storage and switching between these operations could be costly and result in less data being backed up or deduplicated. You must set these operations with minimum to non-overlap time with Dedup schedule.

Q: Is there a recommend or best practice to set the Consistency Check (CC) Window to the same time and duration as the DPM Backup Window? Is there also a best practice for recommended duration?
A: When Dedup is being enabled on the Hyper-V host, it’s recommended to set DPM Backup Window and Consistency Check (CC) Window into non-overlap time with Dedup schedule on the host. In this example, we set Backup jobs and CC to run at the same time (4.00 PM until 4.00 AM). When CC is being scheduled to start at 4.00 PM and run for a duration maximum 12 hours. This will make sure that DPM will check for inconsistent replicas at a specified time (4.00 PM) every day only and then run a consistency check if it finds one. Thus will free the storage IO for Dedup to complete during the day.

Q: Why is the Disk Storage to be provisioned on DPM is twice the size of Total Disk Size on the host?
A: In this example, we provisioned 12 X 1 TB logical data for DPM storage while we have 9 TB actual disk size on the Hyper-V host, it’s NOT twice the size of Total disk size. As mentioned in this article, with Dedup the storage savings is very high for VHDX files and the approximate savings rate is very large 60-90%+ range. Thus we can store more than the actual storage that we have 9TB. In this example, I’ve provisioned (9 TB + 35% = 12 TB). This will guarantee that we can store 12 TB of data on top of 9 TB volume. If the savings percentage resulting from deduplication is high enough, all 12 VHDX files will be able to reach their maximum logical size but still fit in the 9 TB volume (potentially there might even be additional space to allocate additional VHDX files for DPM server to use in the future).

Q: Why is it suggested to disable the Background Optimization deduplication schedule?
A: Data deduplication with Backup Optimization will run always at low priority and pause data deduplication when the system is busy to minimize the impact on system performance. It’s recommended to disable the Dynamic behavior of Background Optimization job and have Dedup to run on specified schedule Window which will minimize the impact on Backup performance.

And as always we would love to hear your feedback and results. Please add your comment to this post and let us know about your experience with these configurations and, of course, any questions you may have.

Thank You!

About Charbel Nemnom 303 Articles
Charbel Nemnom is a Microsoft Cloud Consultant and Technical Evangelist, totally fan of the latest's IT platform solutions, accomplished hands-on technical professional with over 15 years of broad IT Infrastructure experience serving on and guiding technical teams to optimize performance of mission-critical enterprise systems. Excellent communicator adept at identifying business needs and bridging the gap between functional groups and technology to foster targeted and innovative IT project development. Well respected by peers through demonstrating passion for technology and performance improvement. Extensive practical knowledge of complex systems builds, network design and virtualization.

7 Comments

  1. When creating the Storage Pool and Virtual disk in step 4, is there a reason that you format it as NTFS? Or is it just because it’s simple and DPM is going to re-format it as ReFS anyway?

  2. Hi Charbel, I followed the process you provided and everything worked great but I haven’t been able to add a new disk to the Virtual Disk. I was able to add the disk to the Storage Pool and then tried to extend the Virtual Disk but that failed. I wonder if you have the process or a script to add a new physical disk into the Storage Pool and Virtual Disk.

    • Hello Gomez,

      Please note that in the article above, I’ve created the simple storage space virtual disk with Number of Column equal to “1”, because when a virtual disk is created without specifying the NumberOfColumns, storage spaces will set the NumberOfColumns property automatically according to the number of physical disks used to create the virtual disk and cannot be changed after the virtual disk has been created. If for example, the NumberOfColumns is 8, you can only extend the simple fixed virtual disk is to add 8 more 1TB VHDXs to the storage pool!!! so for this reason, I specified the NumberofColumns equal to 1, so we can add 1 disk at a time in the future as needed.

      Hope this helps!

      Cheers,
      -Charbel

  3. Hi Charbel, thanks for getting back with me. I did setup everything the way you did and it works better than I expected. The setup includes the number of columns to be 1 but I can’t find a way to extend the Storage Pool. Here is what I tried, I created a new 1TB disk and attached it to the iSCSI controller. This created a new Storage Pool called Primordial. Then, I ran this command to add the disk to the Storage Pool: Add-PhysicalDisk -PhysicalDisks (Get-StorageSubSystem -FriendlyName “Windows Storage on xxxx” | Get-PhysicalDisk | ? CanPool -NE $false) -StoragePoolFriendlyName “DPM Storage Pool01”. The disk was added to the Storage Pool successfully but things don’t look right. The Storage Pool Free Space went from Free Space 0 Bytes to Free Space 1,023 GBs. Also the new disk Media Type is Unknown even after running command (Get-StoragePool “DPM Storage Pool01” | Get-PhysicalDisk | Where MediaType -eq “Unknwon” | Set-PhysicalDisk -MediaType HDD). So, at this point, I ran the Add-PhysicalDisk command and that added the disk to the Virtual Disk but the capacity never changed. I tried to Extend command but that didn’t work either. So, my question is, what is the process to add a new virtual disk? Should it be added to the Storage Pool first? Should it be added to the Virtual Disk first? Should the disk be added to the Virtual Disk or should the Virtual Disk be extended? I think I understand the process fairly well but unfortunately, I haven’t been able to add a new disk that extends into the Storage Pool and the Virtual Disk.

    • Hello Gomez,

      Here is the script that will extend the simple virtual disk for DPM Storage Pool.

      Before you run the script, make sure you created 1TB disk(s) and attached it to the SCSI controller.

      # Expand Simple Storage Space
      $Pool1 = “DPM Storage Pool”
      $vd1 = “Simple DPM vDisk01”
      $Pooldisks = Get-PhysicalDisk | Where {$_.CanPool -eq $True}
      Add-PhysicalDisk -PhysicalDisks $Pooldisks -StoragePoolFriendlyName $Pool1
      Get-StoragePool $Pool1 | Get-PhysicalDisk | Sort Size | FT FriendlyName, Size, MediaType, HealthStatus, OperationalStatus -AutoSize
      Get-StoragePool $Pool1 | Get-PhysicalDisk | Where MediaType -eq “Unspecified” | Set-PhysicalDisk -MediaType HDD
      Get-StoragePool $Pool1 | Get-PhysicalDisk | Sort MediaType | FT FriendlyName, MediaType,@{l=”Size(GB)”;e={$_.Size/1GB}} -AutoSize

      # Before Resizing the Existing Virtual Disk
      Get-VirtualDisk | FL NumberOfColumns

      Get-VirtualDiskSupportedSize -StoragePoolFriendlyName $Pool1 | FT @{l=”VirtualDiskSizeMin(GB)”;e={$_.VirtualDiskSizeMin/1GB}},
      @{l=”VirtualDiskSizeMax(GB)”;e={$_.VirtualDiskSizeMax/1GB}},@{l=”VirtualDiskSizeDivisor(GB)”;e={$_.VirtualDiskSizeDivisor/1GB}}

      Resize-VirtualDisk -FriendlyName $vd1 -usemaximum

      Get-VirtualDisk $vd1 | Get-Disk | Update-Disk

      Get-VirtualDisk $vd1 | Get-Disk | fl *

      # Extend The Volume
      $Volume = Get-Volume -FileSystemLabel HyperV_VMs_MBS
      $Partition = $Volume | Get-Partition
      $Disk = $Partition | Get-Disk
      $size = (Get-PartitionSupportedSize –DiskNumber $Disk.Number –PartitionNumber $Partition.PartitionNumber)
      Resize-Partition -DiskNumber $Disk.Number –PartitionNumber $Partition.PartitionNumber -Size $size.SizeMax

  4. Charbel, thanks a lot for putting this together. I ran into some issues during the first few runs but got most of those resolved. The only one I couldn’t resolved was this command: Resize-VirtualDisk -FriendlyName $vd1 -usemaximum. Powershell doesn’t recognize the -usemaximum parameter. So, since it didn’t add or extend the virtual disk, I added the disk with the Add-PhysicalDisk command but was never able to extend it. And even though I can see that the new disk belongs to the virtual disk, the virtual disk still shows the original size. That’s the only thing I am working on and I think once I figured that one out, I should be able to use it in production. Again, thanks a lot for putting time on this.

Leave a Reply