Happy New Year folks!!! What a great day to start the year 2020…
Starting with Windows Server 2016, Microsoft introduced a new type of storage called Storage Spaces Direct which is now part of the new Azure Stack HCI program. Azure Stack HCI enables building highly available storage systems with locally attached disks, and without the need to have any external SAS fabric such as shared JBODs or enclosures. This is the first true Software-Defined Storage (SDS) from Microsoft. Software-Defined Storage is a concept, which involves storing data without dedicated hardware.
For more information about Azure Stack HCI, please check the following document.
When you enable Storage Spaces Direct, a new cluster role is enabled by default called the Health Service. The Health Service is a new feature introduced in Windows Server 2016 and later that improves the day-to-day monitoring and operational experience for clusters running Storage Spaces Direct. The good news is, the Health Service does not require any configuration from your side. A long-time ago, I wrote an article on how to Monitor Storage Spaces Direct Health using PowerShell, you can check it out here.
Since the release of the Windows Admin Center, Microsoft keeps investing and maximizing the availability, performance, and management of the Azure Stack HyperConverged Infrastructure.
In this blog post, I will show you how to monitor Azure Stack HCI health with Azure Monitor and Windows Admin Center.
If you are new to Azure Monitor, Azure Monitor maximizes the availability and performance of your applications by delivering a comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments. It helps you understand how your applications are performing and proactively identifies issues affecting them and the resources they depend on.
For more information about Azure Monitor, please check the following document.
Azure Monitor is very helpful to monitor your on-premises HyperConverged cluster. With Azure Monitor integrated with Azure Stack HCI, you will be able to collect events and performance counters for analysis and reporting, take action when a particular condition is detected, and receive notifications via email, SMS (text message), push notification or voice.
The prerequisites are very simple as follows:
- Make sure you are running Windows Admin Center (WAC) Version 1809.5 or later.
- Make sure you have an active Azure subscription. If you don’t have an Azure subscription, you can create a free one here.
- And of course, Azure Stack HCI cluster up and running (Minimum of 2 servers, maximum of 16 servers).
Behind the scene, Windows Admin Center will take care and configure for you the steps below:
- Configuring Azure Stack HCI Health Service.
- Creating Azure Log Analytics workspace.
- Installing and configuring the Microsoft Monitoring Agent (MMA) on every node in the cluster.
- Creating and setting up default alerts such as (CPU utilization, disk capacity, memory utilization, heartbeat, system critical error, and any health service fault on the cluster).
You can configure these steps manually if you want, but Windows Admin Center will make your life much easier.
Monitoring Azure Stack HCI
Launch the Windows Admin Center portal and take the following steps:
- Choose and connect to your Azure Stack HCI cluster, then select Azure Monitor from the left-hand side, and then click Sign in to Azure.
- In the Azure Monitor connection page, click the Onboard cluster.
- In the Setup Azure Monitor page, select the desired subscription, create or use an existing resource group, choose the desired resource group region, then create or use an existing log analytics workspace, as well as, the Workspace region (Note that the Subscription ID has been intentionally obscured in this figure). When ready, click Set up. Windows Admin Center will install the Microsoft Monitoring Agent and Microsoft Dependency agent on the target node(s). Please note that this step may take some time to complete.
- Once the configuration is completed, you will see the following page in Windows Admin Center.
- Microsoft also added deep links, if you can click on any Web links in Windows Admin Center, it will take you directly into that specific Log Analytics workspace where you can do more advanced configuration directly from the Azure portal.
You can now start configuring default alerts that will apply to all nodes in your Log Analytics workspace.
Setting up alerts
Once you finish onboarding your Azure Stack HCI as described in the step above, you can start setting up alerts using Windows Admin Center.
Launch the Windows Admin Center portal and take the following steps:
- Choose your Azure Stack HCI cluster, then select Azure Monitor from the left-hand side.
- In the Monitoring and alerts with Azure Monitor page, under Alerts created in Windows Admin Center click Edit alert rule. In this example, I don’t have any alert rule configured yet.
- As mentioned earlier, Windows Admin Center includes preconfigured alerts that you can use to notify you when the node(s) in your workspace isn’t performing well. The 6 alerts added by the Azure Stack HCI team are the following:
- CPU utilization: 10 min rolling average 85%
- Disk capacity utilization: Over 80% for 10 min
- Memory utilization: Over 95% for 10 min
- Heartbeat: Fewer than 6 beats for 5 min
- System critical error: Fewer than 6 beats for 5 min
- Health service alert: Any critical or warning level alert on the health service of the cluster
- In this example, I will select all the alerts with their default condition. Please note that after you enable any alert, you can tune the alert settings in the Azure portal.
- Under the Add an action section, create or use an existing action group, then enter the Email recipient and finally click Save.
- Once the alerts are created, you will receive an email from Microsoft Azure stating that you’ve been added to an Azure Monitor action group. In Windows Admin Center you can also see and edit the alert rule(s) if you want.
- If you click on visit the Azure portal, you will be redirected to the Log Analytics workspace where you can see and modify the alerts. In this example, I don’t have any alert yet.
- If you want to manage any of the alerts, click on Manage alert rules (6). Click on any of the rules below and modify it’s condition and actions if you want.
- At the time of writing this article, you need to configure manually the Windows Event Logs because, during the onboarding process, Windows Admin Center did not configure the Windows Event Logs appropriately. In the same Log Analytics workspace, select Advanced settings. Select Data, and then select Windows Event Logs. Here, you need to add the Health Service, System and Application log event channel by typing in the name and then click the plus sign + as shown in below figure. Then check the severities Error and Warning, and finally click Save at the top of the page to save the configuration.
Now Log Analytics will start collecting events from the Windows event log and performance counters that you specified for longer-term analysis and reporting, and then take action by sending Email/SMS/Push Notifications/Voice when a particular condition is detected.
Since I don’t have any alert yet, what I will do is the following: I will restart one node in my Azure Stack HCI a couple of times and see if I will receive an alert.
Below is an example of the email that you will receive from Azure Monitor when something goes wrong on your Azure Stack HCI cluster.
For reference, this is what an example alert looks like in Windows Admin Center.
And here what you will in the Azure portal.
That’s it there you have it!
When you enable a monitoring solution in a Log Analytics workspace, all the servers in your Azure Stack HCI cluster will report to that workspace and start collecting data relevant to the Windows Event Logs and Windows Performance Counters, so that the solution can generate insights for all the servers in the workspace.
As shown in this article, Azure Monitor requires the installation of the Microsoft Monitoring Agent (MMA) to collect telemetry data from the Azure Stack HCI servers and push it to the Log Analytics workspace.
The default condition for the (6 rules) is evaluated based on 60 minutes period and frequency of 10 minutes. You can change the default settings or create new rules if you want. When the count is greater than 1, you will receive an alert.
As you can see, monitoring and setting up alerts for Azure Stack HCI using Windows Admin Center and Azure Monitor is very simple.
For more information about Log Analytics in Azure Monitor, please check the following document.
Windows Admin Center is a freely available management tool for anyone to use and makes managing a set of servers with or without GUI remotely, very easy, especially for “day-to-day activities”. Download the latest copy of Windows Admin Center from here, deploy it in a failover cluster, and enjoy the modern server management for your Azure Stack HCI cluster.
Thank you for reading my blog.
If you have any questions or feedback, please leave a comment.