This post was edited/updated on February, 19th, 2021.
The Microsoft guidance is that VMQ should be off on 1Gbps NICs. There is a known problem with the Broadcom 1Gb VMQ implementation that is supposedly fixed in the latest drivers. Please make sure that all your drivers are up-to-date.
Back to basics: What is Virtual Machine Queue (VMQ), why do you need it and why you should enable it?
Virtual Machine Queue or dynamic VMQ is a mechanism for mapping physical queues in a physical NIC to the virtual NIC (vNIC) or virtual machine NIC (vmNIC) in Parent partition or Guest OS. This mapping makes the handling of network traffic more efficient. The increased efficiency results in less CPU time in the parent partition and reduced latency of network traffic.
VMQ spreads the traffic per vmNIC/vNIC, and each VMQ can use at most one logical CPU in the host, in other words, VMQ distributes traffic equally amongst multiple guests (VMs) on a single host with a vSwitch (1 core per vmNIC/vNIC).
Note: The vNIC means a host partition Virtual NIC of the Virtual Switch in the Management OS, and the vmNIC is the synthetic NIC inside a Virtual Machine.
VMQ is auto-enabled by default on Windows Server machines when a vSwitch is created with 10Gig network adapters and above, and it’s useful when hosting many VMs on the same physical host.
The below figure is showing the ingress traffic with VMQ enabled for virtual machines.
[VMQ incoming traffic flow for virtual machines – source Microsoft]
When using 1Gig network adapters VMQ is disabled by default because Microsoft doesn’t see any performance benefit to VMQ on 1Gig NICs, and one CPU/Core can keep up with 1Gig network traffic without any problem.
As I mentioned above with VMQ disabled all network traffic for vmNIC has to be handled by a single core/CPU, however with VMQ enabled and configured the network traffic is distributed across multiple CPUs automatically.
Now, what happened if you have a large number of Web Servers VMs on a host with 2 eight-core processors or more and with a large amount of memory but you are limited by the physical NICs with 1Gig only?
The answer is…
VMQ and vRSS better together
As I demonstrated in previous blog posts, Post I and Post II, Windows Server 2012 R2 introduced a new feature called Virtual Receive Side Scaling (vRSS). This feature works with VMQ to distribute the CPU workload of receive network traffic across multiple (vCPUs) inside the VM. This effectively eliminates the CPU core bottleneck that we experienced with a single vmNIC. To take full advantage of this feature both the host and the guest need to be Windows Server 2012 R2. As a result, VMQ needs to be enabled on the physical host and RSS enabled inside the virtual machine, but until this point in time Microsoft doesn’t actually enable vRSS for the host vNICs, it’s only for VMs so we are stuck with one processor on the host Management partition with Converged Network environment. The good point is the vNICs on the host side get locked to one processor, but they will still get VMQs assuming you have enough Queues and they get distributed across different processors.
The requirements to enable VMQ are the following:
- Windows Server 2012 R2 (dVMQ+vRSS).
- The Physical network adapters must support VMQ.
- Install the latest NIC driver/firmware (very important).
- Enable VMQ for 1Gig NICs in the registry, this step can be skipped if you have 10Gig adapters or more:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\VMSMP\Parameters\BelowTenGigVmqEnabled = 1
- Reboot the host if you enable the registry key in step 4.
- Determine the values for Base and Max CPU based on your hardware configuration.
- Assign values for Base and Max CPU.
- Enable RSS inside the Virtual Machines.
- Turn on VMQ under Hyper-V settings for each VM which is already ON by default.
What is the Base CPU? It is the first CPU used for processing the incoming traffic for a particular vmNIC.
What is the Max CPU? It is the maximum number of CPUs that we allow that NIC to process the traffic on.
Ok, so having this explained let’s configure VMQ step by step:
Our Lab Scenario:
We have 8 Physical 1Gig NICs and 2 X 8-core (32 logical processors).
First, we need to determine if HyperThreading is enabled by running the following cmdlet:
PS C:\Get-WmiObject –Class win32_processor | ft –Property NumberOfCores, NumberOfLogicalProcessors –auto
As you can see we have the NumberOfLogicalProcessors as twice as the NumberOfCores, so we know that HT is enabled in the system.
Next, we need to look at our NIC Teaming and load distribution mode:
PS C:\Get-NetlbfoTeam | ft –Property TeamNics, TeamingMode, LoadBalancingAlgorithm –auto
After we determined that HyperThreading is enabled and the Teaming Mode is Switch Independent with Dynamic Mode, next we move on to Assign the Base and Max CPUs.
Attention: before moving into the assignment, one important point to consider, if the NIC team is in Switch-Independent teaming mode and the Load Distribution is set to Hyper-V Port mode or Dynamic mode, then the number of queues reported is the sum of all the queues available from the team members (SUM-Queues mode), otherwise the number of queues reported is the smallest number of queues supported by any member of the team (MIN-Queues mode).
What is (SUM-Queues mode) and What is (MIN-Queues mode)?
The SUM-Queues mode is the total number of VMQs of all the physical NICs that are participating in the team, however, the MIN-Queues mode is the minimum number of VMQs of all the physical NICs that are participating in the team.
As an example, let’s consider we have two physical NICs with 4 VMQs each, if the teaming mode is Switch Independent with Hyper-V Port, the mode will be SUM-Queues equal to 8 VMQs, however, if the teaming mode is Switch Dependent with Hyper-V Port the mode will be MIN-Queues equal to 4 VMQs.
[You can refer to the table below in order to determine the Teaming and Load distribution mode, source – Microsoft]:
|Distribution mode→Teaming mode↓||Address Hash modes||Hyper-V Port||Dynamic|
In my scenario, the NIC Team is in Switch Independent with Dynamic Mode so we are in SUM-Queues mode.
If the team is in Sum-Queues mode the team members’ processors should be, non-overlapping or with little overlap as possible. For example, in a 4-core host (8 logical processors) with a team of 2X10Gbps NICs, you could set the first NIC to use a base processor of 2 and to use 4 cores; the second would be set to use base processor 6 and use 2 cores.
If the team is in Min-Queues mode the processor sets used by the team members must be identical, you should configure each NIC team member to use the same cores, in other words, the assignment for each physical NIC will be the same.
Now let’s check first if VMQ is enabled:
As you can see VMQ is enabled (=True) but not yet configured.
And here we have two Converged Network Teams with 4 Physical NICs and 16 Queues each, so the total number of VMQs per team is 64.
I am using one Converged Team for vmNIC (VMs) and the second one is used for vNICs in the host.
We will set the Base and Max CPUs by running the following cmdlets for the teamed adapters under ConvergedNetTeam01:
PS C:\Set-NetAdapterVmq –Name NIC-b0-f0 –BaseProcessorNumber 2 –MaxProcessors 8
PS C:\Set-NetAdapterVmq –Name NIC-b0-f1 –BaseProcessorNumber 10 –MaxProcessors 8
PS C:\Set-NetAdapterVmq –Name NIC-b0-f2 –BaseProcessorNumber 18 –MaxProcessors 8
PS C:\Set-NetAdapterVmq –Name NIC-b0-f3 –BaseProcessorNumber 26 –MaxProcessors 8
As I mentioned above that in (Sum-Queues mode) you should configure the Base and Max CPU for each physical NIC with non-overlapping as possible, but in our lab environment, we didn’t have as many cores as we had Queues so we had to have some overlap otherwise we are wasting our Queues.
Let’s run Get-NetAdapterVmq again and see the changes:
As you can see the Base and Max processors are set now, next we can run the Get-NetAdapterVmqQueue and this will show us how all queues are assigned across the VMQs in the vmNICs for all VMs on that particular host.
Now let’s see the result before and after VMQ + vRSS are enabled:
VMQ and vRSS disabled
In the Guest OS:
In the Host:
VMQ and vRSS enabled
In the Guest OS:
In the Host:
Best practices for configuring VMQ
Last but not least, good practices for configuring VMQ:
- When using NIC Teaming, always use Switch Independent with Dynamic Mode when possible.
- Make sure your base processor is never set to Zero to ensure the best performance because CPU0 handles special functions that cannot be handled by any other CPU in the system.
- Keep in mind when assigning the Base/Max CPU and HyperThreading is enabled in the system, only the even number of the processor is the real processor (2,4,6,8, etc…), if HT is not enabled you can use even and odd number (1, 2, 3, 4, 5, etc…).
- In SUM-Queues mode, try to configure the Base and Max CPU for each physical NIC with as little overlap as possible, this depends on the host hardware configuration with multiple cores.
- Only assign Max Processor values of 1,2,4,8. It is ok to have a max processor number that will extend past the last core, or exceeds the number of VMQs on the physical NIC.
- Don’t set the Base & Max processors on the Multiplexor NIC Teamed Adaptors, leave it as default.
In conclusion, I would prefer to enable VMQ on 1Gig NICs so I can keep my network traffic spread across as many CPU/cores as possible.
For VMQ and RSS deep dive, here you go TechNet 3 part series VMQ Deep Dive.
Hope this helps.
Until then, enjoy your weekend!