Updated – 22/04/2019 – Monitor Azure File Sync with Azure Monitor is GA.
Table of Contents
Introduction
In July 2018, Microsoft announced the GA release for Azure File Sync. With Azure File Sync, you can centralize your files in Azure and then install a storage sync agent on a Windows Server whether it’s on-premises or in Azure to provide fast local access to your files. Your server and Azure Files are constantly in sync, so you have one centralized location for your files with multi-site access powered by fast local caches and cloud tiering.
What Cloud Tiering does is, it over time builds up a heat map on your disks of what files are being used, what files are being written to, and then as the disks become full, the files will be moved to the cloud and keep only stubs (namespace) on the disks locally, so when the user clicks on any tiered file, it will download seamlessly from Azure Files rather than opening straight from local disks. This is desirable for those files that you are not using very often but you still want them to be around.
If you want to know more about Azure File Sync, please check my previous step-by-step article on how to get started with Azure File Sync.
Now you enabled Azure File Sync and everything is running well, but you want to monitor the health status. In this article, we will show you the monitoring options which are available at your disposal as of today to monitor Azure File Sync, which can help you to troubleshoot any issue that you might face.
Monitoring Azure File Sync
The following monitoring options are available as of today:
Option #1 – Azure Portal
You can use the Azure Portal to view the Registered Server state and the Server Endpoint Health (sync health).
Registered Server State
- If Registered server state = Online, the server is successfully communicating with the storage sync service.
- If Registered server state = Offline or Appears Offline, then you need to verify the Storage Sync Monitor (AzureStorageSyncMonitor.exe) process on the server is running. If the server is behind a Firewall or Proxy, then refer to the following documentation to configure the firewall and proxy.
Server Endpoint Health
- The server endpoint health in the Azure Portal is based on the sync events that are logged locally on the server in the Telemetry event logs (ID 9102 and 9302) – check Option #2 for more information. If a sync session fails due to a transient error (e.g. error canceled), the sync may still show healthy in the portal as long as the current sync session is making progress (Event ID 9302 is used to determine if files are being applied).
- If the portal shows a sync error due to sync not making progress, then check the following documentation for troubleshooting guidance.
Option #2 – Windows Server Event Logs
You can use the following Telemetry event logs to monitor Azure File Sync locally on the server in Event Viewer under (Applications and Services Logs\Microsoft\File Sync\Agent).
Sync Health
- The Event ID 9102 is logged once a sync session is completed. This event should be used to determine if sync sessions are completed successfully (HResult = 0) and if there are per-item sync errors. Check the following documentation for more information: Sync Health & Per-Item Errors. The “Files not syncing” metric is based on the Event ID 9102 – Sync session completion event (PerItemErrorCount value), this value includes both transient and persistent per-item errors.
- The Event ID 9302 is logged every 5 to 10 minutes if there’s an active sync session. This event should be used to determine if the current sync session is making progress (AppliedItemCount > 0). If sync is not making progress, the sync session should eventually fail and an Event ID 9102 will be logged with the error. Check the following documentation for more information about Sync Progress.
Registered Server Health
- The Event ID 9301 is logged every 30 seconds when a server queries the service for any jobs. If GetNextJob completes with status = 0, the server can communicate with the storage sync service. If GetNextJob completes with an error, then check the following documentation for troubleshooting guidance.
Cloud Tiering Health
Tiering: To monitor tiering activity and errors on a server, check the following event logs:
- Event ID 9002 provides ghosting statistics for a server endpoint. For example, TotalGhostedFileCount, SpaceReclaimedMB, etc.
- Event ID 9003 provides error distribution for a server endpoint. For example, Total Error Count, ErrorCode, etc. Note, one event is logged per error code.
- Event ID 9016 provides ghosting results for a volume. For example, Free space percent, Number of files ghosted in session, Number of files that failed to ghost, etc.
- Event ID 9029 provides ghosting session information. For example, Number of files attempted in the session, Number of files tiered in the session, Number of files already tiered, etc.
Recall: To monitor recall activity and errors, check the following event logs:
- Event ID 9005 provides recall reliability for a server endpoint. For example, Total unique files accessed, Total unique files with failed access, etc.
- Event ID 9006 provides recall error distribution for a server endpoint. For example, Total Failed Requests, ErrorCode, etc. Note, one event is logged per error code.
- Event ID 9007 provides recall performance for a server endpoint. For example, TotalRecallIOSize, TotalRecallTimeTaken, etc.
Since I just enabled Azure File Sync, I don’t have any event ID logged for recall activity yet.
Option #3 – Azure File Sync Performance Counters
You can also use the Azure File Sync built-in performance counters to monitor sync activity locally on the server.
Open Perfmon.msc and add the following performance counters:
AFS Bytes Transferred
- Downloaded Bytes/sec
- Total Bytes/sec
- Uploaded Bytes/sec
AFS Sync Operations
- Downloaded Sync Files/sec
- Total Sync File Operations/sec
- Uploaded Sync Files/sec
Option #4 – Azure Portal – Azure Monitor
Last but not least, you can use Azure Monitor. Azure Monitor is still a work in progress, at the time of writing this article, you can view the following metrics for Azure File Sync in Azure Monitor when you select the Storage Sync Service.
- The Bytes synced metric shows the total size of data transferred (upload and download).
- The Cloud tiering recall metric shows the size of data recalled.
- The Files not syncing metric shows the count of files that are failing to sync.
- The Files synced metric shows the count of files transferred (upload and download).
- The Server online status metric shows the count of heartbeats received from the server.
- The Sync session result metric shows the sync session result (1=successful sync session; 0=failed sync session).
You can expect a lot of improvements and enhancements that will be added to Azure Monitor soon.
Summary
As you can see, we have several options to monitor Azure File Sync’s health and activity status. I see the integration with Azure Monitor looks promising. Microsoft has great documentation for troubleshooting guidance, make sure to check it if you encounter any issue.
Azure File Sync extends on-premises files servers into Azure by providing cloud benefits while maintaining performance and compatibility. Azure File Sync provides:
- Multi-site access – provide write access to the same data across Windows servers and Azure Files.
- Cloud tiering – store only recently accessed data on local servers.
- Integrates with Azure backup service so no need to back up your data on-premises.
- Fast disaster recovery – restore file metadata immediately and recall data as needed.
I hope you find this guide useful.
__
Thank you for reading my blog.
If you have any questions or feedback, please leave a comment.
-Charbel Nemnom-
Very good article, I have a question around metric “Files not Syncing”
When the metric is calculated which EventId is it using? Also, is the number included Transient Fail and Persistent Fail to sync?
Or does it purely only include Persistent Fail to sync in that value?
As it is related to Threshold on how to setup
Hello Telekinetic, thanks for the comment!
Please note that the “Files not syncing” metric is based on the Event ID 9102 – Sync session completion event (PerItemErrorCount value).
This value includes both transient and persistent per-item errors.
Hope it helps!
Thank you, Charbel :) just one more if I may, I notice that the metric for “File not Syncing” is based on AVG… So it means it uses average overtime right?
If we have to set up an alert so we can fix when some file is stuck what’s the typical Threshold in this case then in your article example seem to be 100? But in my case I see, it is 0.03 0.04 and so on… It seems to be very hard to predict…
Thank you so much for your clarification! It is really helpful
Hello Telekinetic, yes, the metric for “File not Syncing” is based on Average and it uses average overtime.
In this case, the typical Threshold is hard to predict, you might need to adjust the Threshold based on how aggressive you want to respond.
I suggest starting small in your case and then increasing the Threshold as needed.
Furthermore, you might want to look at why files are failing to sync and then treat the root cause.
Hope it helps!
Thank you very much, Charbel.
Thanks for always producing great content too!