How To Enable & Configure VMQ/dVMQ on Windows Server 2012 R2 with Below Ten Gig Network Adapters #HyperV #Vmq #vRSS

This post was edited/updated on 04/May/2015.

The Microsoft guidance is that VMQ should be off on 1Gbps NICs. There is a known problem with the Broadcom 1Gb VMQ implementation that is supposedly fixed in the latest drivers. Please make sure that all your drivers are up-to-date.

Hello folks,

Back to basics: What is Virtual Machine Queue (VMQ), why do you need it and why you should enable it?

Virtual Machine Queue or dynamic VMQ is a mechanism for mapping physical queues in a physical NIC to the virtual NIC (vNIC) or virtual machine NIC (vmNIC) in Parent partition or Guest OS. This mapping makes the handling of network traffic more efficient. The increased efficiency results in less CPU time in the parent partition and reduced latency of network traffic.

VMQ spreads the traffic per vmNIC/vNIC, and each VMQ can use at most one logical CPU in the host, in other words VMQ distributes traffic equally amongst multiple guests (VMs) on a single host with a vSwitch (1 core per vmNIC/vNIC).

Note: The vNIC means a host partition Virtual NIC of the Virtual Switch in the Management OS, and the vmNIC is the synthetic NIC inside a Virtual Machine.    

VMQ is auto enabled by default on Windows Server machines when a vSwitch is created with 10Gig network adapters and above, and it’s useful when hosting many VMs on the same physical host.

The below figure is showing the ingress traffic with VMQ enabled for virtual machines.

image

[VMQ incoming traffic flow for virtual machines – source Microsoft]

When using 1Gig network adapters VMQ is disabled by default, because Microsoft don’t see any performance benefit to VMQ on 1Gig NICs, and one CPU/Core can keep up with 1Gig network traffic without any problem.

As I mentioned above with VMQ disabled all network traffic for vmNIC has to be handled by a single core/CPU, however with VMQ enabled and configured the network traffic is distributed across multiple CPUs automatically.

Now what happened if you have a large number of Web Servers VMs on a host with 2 eight core processors or more and with large amount of memory but you are limited by the physical NICs with 1Gig only?

The answer is…

VMQ and vRSS better together Smile

As I demonstrated in a previous blog posts, Post I and Post II, Windows Server 2012 R2 introduced a new feature called Virtual Receive Side Scaling (vRSS). This feature works with VMQ to distribute the CPU workload of receive network traffic across multiple (vCPUs) inside the VM. This effectively eliminates the CPU core bottleneck that we experienced with a single vmNIC. To take the full advantage of this feature both the host and the guest need to be Windows Server 2012 R2. As a result VMQ needs to be enabled on the physical host and RSS enabled inside the virtual machine, but until this point in time Microsoft don’t actually enable vRSS for the host vNICs, it’s only for VMs so we are stuck with one processor on the host Management partition with Converged Network environment. The good point is the vNICs on the host side get locked to one processor, but they will still get VMQs assuming you have enough Queues and they get distributed across different processors.  

The requirements to enable VMQ are the following:

1. Windows Server 2012 R2 (dVMQ+vRSS).
2. The Physical network adapters must support VMQ.
3. Install the latest NIC driver/firmware (very important).
4. Enable VMQ for 1Gig NICs in the registry, this step can be skipped if you have 10Gig adapters or more:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\VMSMP\Parameters\BelowTenGigVmqEnabled = 1
VmqBT-01

5. Reboot the host if you enable the registry key in step 4.
6. Determine the values for Base and Max CPU based on your hardware configuration.
7. Assign values for Base and Max CPU.
8. Enable RSS inside the Virtual Machines.
9. Turn on VMQ under Hyper-V settings for each VM which is already ON by default.

What is the Base CPU? It is the first CPU used for processing the incoming traffic for a particular vmNIC.

What is the Max CPU? It is the maximum number of CPU that we allow that NIC to process the traffic on.

Ok, so having this explained let’s configure VMQ step by step:

Our Lab Scenario:

We have 8 Physical 1Gig NICs and 2 X 8-core (32 logical processors).

First we need to determine if HyperThreading is enabled by running the following cmdlet:

PS C:\Get-WmiObject –Class win32_processor | ft –Property NumberOfCores, NumberOfLogicalProcessors –auto

VmqBT-02

As you can see we have the NumberOfLogicalProcessors as twice as the NumberOfCores, so we know that HT is enabled in the system.

Next, we need to look at our NIC Teaming and load distribution mode:

PS C:\Get-NetlbfoTeam | ft –Property TeamNics, TeamingMode, LoadBalancingAlgorithm –auto

VmqBT-03

After we determined that HyperThreading is enabled and the Teaming Mode is Switch Independent with Dynamic Mode, next we move on to Assign the Base and Max CPUs.

Attention: before moving into the assignment, one important point to consider, if the NIC team is in Switch-Independent teaming mode and the Load Distribution is set to Hyper-V Port mode or Dynamic mode, then the number of queues reported is the sum of all the queues available from the team members (SUM-Queues mode), otherwise the number of queues reported is the smallest number of queues supported by any member of the team (MIN-Queues mode).

What is (SUM-Queues mode) and What is (MIN-Queues mode)?

The SUM-Queues mode is the total number of VMQs of all the physical NICs that are participating in the team, however the MIN-Queues mode is the minimum number of VMQs of all the physical NICs that are participating in the team.

As an example, let’s consider we have two physical NICs with 4 VMQs each, if the teaming mode is Switch Independent with Hyper-V Port, the mode will be SUM-Queues equal to 8 VMQs, however if the teaming mode is Switch Dependent with Hyper-V Port the mode will be MIN-Queues equal to 4 VMQs.

[You can refer to the table below in order to determine the Teaming and Load distribution mode, source – Microsoft]:

Distribution mode→Teaming mode↓ Address Hash modes Hyper-V Port Dynamic
Switch independent Min-Queues Sum-Queues Sum-Queues
Switch dependent Min-Queues Min-Queues Min-Queues

In my scenario, the NIC Team is in Switch Independent with Dynamic Mode so we are in SUM-Queues mode.

If the team is in Sum-Queues mode the team members’ processors should be, non-overlapping or with little overlap as possible. For example, in a 4-core host (8 logical processors) with a team of 2X10Gbps NICs, you could set the first NIC to use base processor of 2 and to use 4 cores; the second would be set to use base processor 6 and use 2 cores.

If the team is in Min-Queues mode the processor sets used by the team members must be identical, you should configure each NIC team member to use the same cores, in other words the assignment for each physical NIC will be the same.

Now let’s check first if VMQ is enabled:

PS C:\Get-NetAdapterVmq

VmqBT-04

As you can see VMQ is enabled (=True) but not yet configured.

And here we have two Converged Network Teams with 4 Physical NICs and 16 Queues each, so the total number of VMQs per team is 64.

I am using one Converged Team for vmNIC (VMs) and the second one is used for vNICs in the host.

We will set the Base and Max CPUs by running the following cmdlets for the teamed adapters under ConvergedNetTeam01:

PS C:\Set-NetAdapterVmq –Name NIC-b0-f0 –BaseProcessorNumber 2 –MaxProcessors 8
PS C:\Set-NetAdapterVmq –Name NIC-b0-f1 –BaseProcessorNumber 10 –MaxProcessors 8
PS C:\Set-NetAdapterVmq –Name NIC-b0-f2 –BaseProcessorNumber 18 –MaxProcessors 8
PS C:\Set-NetAdapterVmq –Name NIC-b0-f3 –BaseProcessorNumber 26 –MaxProcessors 8

As I mentioned above that in (Sum-Queues mode) you should configure the Base and Max CPU for each physical NIC with non-overlapping as possible, but in our lab environment we didn’t have as many cores as we had Queues so we had to have some overlap otherwise we are wasting our Queues.

Let’s run Get-NetAdapterVmq again and see the changes:

VmqBT-05

As you can see the Base and Max processors are set now, next we can run the Get-NetAdapterVmqQueue andthis will shows us how all queues are assigned across the VMQs in the vmNICs for all VMs on that particular host.

VmqBT-06

Now let’s see the result before and after VMQ + vRSS are enabled:

VMQ and vRSS disabled

In the Guest OS:

VmqBT-07

In the Host:

VmqBT-08

VMQ and vRSS enabled

In the Guest OS:

VmqBT-09

In the Host:

VmqBT-10

Last but not least best practices for configuring VMQ:

1. When using NIC Teaming, always use Switch Independent with Dynamic Mode when possible.
2. Make sure your base processor is never set to Zero to ensure best performance, because CPU0 handles special functions that cannot be handled by any other CPU in the system.
3. Keep in mind when assigning the Base/Max CPU and HyperThreading is enabled in the system, only the even number of processor is real processor (2,4,6,8, etc…), if HT is not enabled you can use even and odd number (1,2,3,4,5, etc…).
4. In SUM-Queues mode, try to configure the Base and Max CPU for each physical NIC with little overlap as possible, this is depends on the host hardware configuration with multiple cores.
5. Only assign Max Processor values of 1,2,4,8. It is ok to have a max processor number that will extend past the last core, or exceeds the number of VMQs on the physical NIC.
6. Don’t set the Base & Max processors on the Multiplexor NIC Teamed Adaptors, leave it as default.

In conclusion, I would prefer to enable VMQ on 1Gig NICs so I can keep my network traffic spread across as many CPU/cores as possible Smile.

For VMQ and RSS deep dive, here you go TechNet 3 part series VMQ Deep Dive.

Hope this helps.

Until then, enjoy your weekend!

Cheers,
/Charbel

About Charbel Nemnom 288 Articles
Charbel Nemnom is a Microsoft Cloud Consultant and Technical Evangelist, totally fan of the latest's IT platform solutions, accomplished hands-on technical professional with over 15 years of broad IT Infrastructure experience serving on and guiding technical teams to optimize performance of mission-critical enterprise systems. Excellent communicator adept at identifying business needs and bridging the gap between functional groups and technology to foster targeted and innovative IT project development. Well respected by peers through demonstrating passion for technology and performance improvement. Extensive practical knowledge of complex systems builds, network design and virtualization.

4 Comments

  1. Thanks for this post in particular.

    In your fourth PowerShell statement, when you “Set-NetAdapterVmq”, why is BaseProcessorNumber set to 24, not 26? All the others are 8 apart (2,10,18) so the next in the sequence “should” be 26, not 24. I’m using Switch Independent with Dynamic, so in SUM-Queues mode, and aiming for as little overlap as possible, as per point 4 in your list of Best Practices.

    Thanks again!

    • Hello,
      I am glad that you liked the post.
      My system has 32 max logical processors, therefore for the fourth cmdlet I overlapped the last core with the third cmdlet (18+8=25)+(24+8=31), you could also do this (Set-NetAdapterVmq –Name NIC-b0-f3 –BaseProcessorNumber 26 –MaxProcessors 8) [26+8=33]. Always you choose the even number in a HT system.
      PS C:\Set-NetAdapterVmq –Name NIC-b0-f0 –BaseProcessorNumber 2 –MaxProcessors 8
      PS C:\Set-NetAdapterVmq –Name NIC-b0-f1 –BaseProcessorNumber 10 –MaxProcessors 8
      PS C:\Set-NetAdapterVmq –Name NIC-b0-f2 –BaseProcessorNumber 18 –MaxProcessors 8
      PS C:\Set-NetAdapterVmq –Name NIC-b0-f3 –BaseProcessorNumber 24 –MaxProcessors 8
      As I mentioned, In SUM-Queues mode, try to configure the Base and Max CPU for each physical NIC with little overlap as possible, if you don’t have enough cores. It is ok to have the max processor number that will extend past the last core [26+8=33], since I have only 32 LP.
      Hope this helps!

      Cheers,

Leave a Reply