I recently did a 2 Nodes Storage Spaces Direct Hyper-Converged deployment using SSDs & HDDs on top of HPE DL380 Gen9 servers.
I used the following hardware configuration:
2 Nodes Cluster – Hardware Platform
HPE DL380 Gen9 24SFF + 2SFF Front/Rear.
HPE 2 X Intel® Xeon® CPU E5-2690v3 @ 2.60GHz (12-Cores).
HPE 256GB RAM DDR4 2133.
HPE H240 12Gb 2-ports Smart HBA.
HPE 2 X 120GB 6G SSD SATA Read Intensive-3 SFF (2.5-inch) – OS.
HPE 4 X 2TB 12G HDD SATA 7.2K rpm SFF (2.5-inch) – Capacity.
HPE 4 X 960GB 6G SSD SATA Read Intensive-3 SFF (2.5-inch) – Cache.
HPE Mellanox ConnectX-4 Lx EN with 10/25Gb/s Ethernet, dual port connected, RoCE v2.
In this configuration, each SSD disk will serve as a Cache device for the Capacity slow HDD (1:1 ratio).
Software Configuration
Host: Windows Server 2016 Datacenter Core Edition with March 2017 update
Single Storage Pool
2 X 2.5 TB 2-copy mirror volume
ReFS/CSVF file system
40 virtual machines (20 VMs per server)
4 virtual processors and 8 GB RAM per VM
VM: Windows Server 2016 Datacenter Core Edition with March 2017 update
Jumbo Frame Enabled
Workload Configuration
DISKSPD version 2.0.17 workload generator
VM Fleet workload orchestrator
First Test – Total 376K IOPS – Read/Write Latency @ 8ms
Each VM configured with:
4K IO size
10GB working set
100% read and 0% write
No Storage QoS
RDMA Enabled
Second Test – Total 210K IOPS – Read/Write Latency @ 7ms
Each VM configured with:
4K IO size
10GB working set
70% read and 30% write
No Storage QoS
RDMA Enabled
And here is the RDMA activity and processor information for each node during the workload.
The Mellanox ConnectX-4 RDMA card support dual speed 10/25Gb/s. In this example, I was using 10Gb/s since I don’t have a switch and cables that support 25Gb/s.
As for the Read/Write latency. In S2D, the writes have to be completed on both nodes, and most of the reads are able to be done from the local node and only that node. There is logic in S2D that if the read is available locally it will be read locally, whether in cache or capacity devices. In this example, I have 20 VMs X 50 GB VHDX per server which is more than the cache devices I have, thus the Read/Write latency is high due to the slow capacity devices. In other words, the Cache drives are merely a technique to speed up slow (snail) disks.
In the final test, I reduced the number of VMs to 10 per server.
Third Test – Total 375K IOPS – Read/Write Latency @ 1.7ms
Each VM configured with:
4K IO size
10GB working set
100% read and 0% write
No Storage QoS
RDMA Enabled
Fourth Test – Total 220K IOPS – Read/Write Latency @ 2.7ms
Each VM configured with:
4K IO size
10GB working set
70% read and 30% write
No Storage QoS
RDMA Enabled
Last but not least, Microsoft also released System Center 2016 Management Pack for Windows Storage Spaces Direct, which gives you a complete visibility to monitor your S2D infrastructure.
Let me know what you think.
Cheers,
-Ch@rbel-