I have recently upgraded my Hyper-V host and my network infrastructure to 10Gb.
As soon as I moved the Hyper-V virtual switch to the LBFO team with 2X10Gb, I start receiving the following error:
If we look deeper in the event log, we can see the reason is self-explanatory.
So what is sum-queue mode and what is the min-queue mode?
A while ago, I posted a detailed article on How to Enable and Configure VMQ/dVMQ on Windows Server 2012 R2 with below Ten Gig Network Adapters.
Please make sure to check the article before you proceed with the resolution.
As a quick recap, the SUM-Queues mode is the total number of VMQs of all the physical NICs that are participating in the team, however, the MIN-Queues mode is the minimum number of VMQs of all the physical NICs that are participating in the team.
The question is why we don’t get the same error when you have 1Gb network adapters? because when using 1Gb network adapters, VMQ is disabled by default because Microsoft doesn’t see any performance benefit to VMQ on 1Gb NICs, and a single core can keep up at ~3.5Gb throughput without any problem.
If you need to enable VMQ on 1Gb NICs, please refer to this article.
In my scenario, I am using 2X10Gb adapters configured with Switch independent teaming mode and Dynamic as distribution mode.
|Distribution mode→Teaming mode↓||Address Hash modes||Hyper-V Port||Dynamic|
If you look in the table above, you can see that I am using the Sum-queue mode.
First, we need to check the total number of VMQs for my LBFO team.
As you can see VMQ is enabled True, but the Base and Max processors for both 10Gb adapters are set to 0 and Max to 16, therefore the processor sets are overlapping, because the LBFO team is set up for Sum of Queues, the network adapters in the team have to use non-overlapping processor sets.
In this example, I have one Converged Virtual Switch with (2 X 10Gb) NICs teamed and 63 Queues for each NIC used for vNICs on the host and for vmNICs for VMs, so the total number of VMQs for the LBFO team is 126.
You may wonder why 63 queues and not 64, in my scenario here, is 128 (64 NIC1 + 64 NIC2), but 1 VMQ queue is reserved by the system so you’ll see 63 per port and 126 per LBFO team.
Before we start configuring the VMQ for each NIC adapter, we need to determine if Hyper-threading is enabled in the system by running the following cmdlet:
As you can see we have the NumberOfLogicalProcessors as twice as the NumberOfCores, so my server has two 8-core CPUs and Hyper-Threading is enabled, we can see 32 LPs in the task manager.
Let’s start configuring the virtual machine queue for both adapters.
Since my team is in Sum-Queues mode, the team members’ processors should not-overlap or with little overlap as possible. For example, in my scenario I have a 16-core host (32 logical processors) with a team of 2X10Gbps NICs, I will set the first NIC1 to use a base processor of 0 and use max processors 8 cores (so this NIC would use processor 0, 2, 4, 6, 8,10,12,14 for VMQ); the second NIC2 would be set to use base processor 16 and use 8 cores as well (so this NIC would use processor 16, 18, 20, 22, 24, 26, 28, 30 for VMQ).
As a best practice, please make sure the base processor is not set to 0, because the first core (logical 0) is reserved for default (non-RSS and non-DVMQ) network processing. If you are using Windows Server 2019 or later version, then this configuration is not considered as a best practice anymore, since the dynamic algorithm has changed, it will move workloads away from a burdened core (0), however, it would still be a best practice to do this in case of a driver bug.
Let’s open PowerShell and Set-NetAdapterVmq accordingly for each NIC:
Let’s verify now that VMQ is applied:
As you can see now, the baseVmqProcessor for NIC1 is 0 and the baseVmqProcessor for NIC2 is 16.
So what we have done in this case, the 126 queues are spread across the 16 processors, the first NIC in my example has 63 queues, so it can spread anywhere from processor 0 to 15, and the second NIC from processor 16 to processor 31. Keep in mind that all 16 CPUs will be used since I have more queues than CPUs. However, if you have for example 8 queues per NIC then no more than 8 CPUs will be used since there are only 8 queues.
But after I set the VMQ, the error did not go away.
As I mentioned at the beginning of this article, I am using one Converged Team for vmNIC (VMs) and for vNICs in the host as well.
If we look at the RSS on the host, we can see the Base and Max processors for NIC1 are set to 0 and NIC2 is set to 16 as well, therefore the processor sets are overlapping with VMQ.
As a side note and best practice, you should split the vNICs on the Host from the vmNICs on two separate physical adapters (teamed).
In this case, we will roughly split between RSS and Dynamic VMQ 50/50.
The 16 logical processors on CPU0 (0–15) will be used by RSS. The remaining 16 logical processors of CPU1 (16–31) will be used by DVMQ.
The settings for the two 10Gb NICs will depend again on whether NIC teaming is in Sum-of-Queues mode or in Min-Queues mode. NIC1 (Fiber01) and NIC2 (Fiber02) are in a Switch-Independent and Dynamic mode team, so they are in Sum-of-Queues mode. This means the NICs in the team need to use non-overlapping processor sets. The settings for the two 10Gb NICs will therefore be illustrated as the following:
Set-NetAdapterRss “Fiber01” –BaseProcessorNumber 0 –MaxProcessors 4 Set-NetAdapterRss “Fiber02” –BaseProcessorNumber 8 –MaxProcessors 4 Set-NetAdapterVmq ”Fiber01” –BaseProcessorNumber 16 –MaxProcessors 4 Set-NetAdapterVmq “Fiber02” –BaseProcessorNumber 24 –MaxProcessors 4
Note: According to Microsoft, as soon as you bond the Hyper-V Virtual Switch to the LBFO team, the RSS will be disabled on the host and VMQ will be enabled, in other words, the Set-NetAdapterRss actually does not have any effect and the Set-NetAdapterVmq will take precedence, therefore if we look again, we can see that RSS will align with VMQ.
Next, you need to reboot your Virtual Machines in order for the new settings to take effect because each vmNIC will be assigned one queue once the VM is booted.
Last but not least, you can verify this by running Get-NetAdapterVmqQueue and this will show you all the queues they are assigned across the vmNICs for all VMs on that particular Hyper-V host.
Finally, after setting the VMQ and RSS correctly on the system, the error is disappeared!
Hope this helps.
Enjoy your day!