I recently had a problem where I could not get Azure Stack Infrastructure Backup to work correctly for me whatsoever.
The two issues I had were the following error:
- Failed to create the backup. Error message: backup failed with status Failed.
- Infrastructure backup failed because of an unknown error.
As you can see in the screenshot above, the error is unclear!!! So I followed the remediation steps:
- I started a new Backup through the portal and with PowerShell. I ended with the same error after approximately 10 minutes.
- I collected the Azure Stack logs.
I reached out to the Azure Stack product group and they investigated over one night (time zone difference). Eventually many hours of troubleshooting, the logs identified the problem.
- ERCS VM is running high memory usage, and from the log we saw bunch of the OutOfMemory Exceptions.
- This issue is only on ASDK 1807 update as the ERCS VM default memory on ASDK is 2GB.
- The workaround is to increase the ERCS VM memory to 4GB manually.
- The fix will be addressed in the next update.
So I bumped the ERCS VM to 4GB and even to 8 GB and retried the backup, but I encountered another issue.
- Since I ran many backups, the Backup service got to a point when the service got choked on memory and died, which caused all backups attempts later to failed completely.
- To resolve this issue we had to reboot the ERCS VM and canceled all previous backup actions. Then we checked that the cluster is in a healthy state.
We triggered a new Infrastructure Backup, and things finally worked.
The ERCS VM memory issue won’t appear again in the next update.
Many Thanks to Tony, Peter, Vijay, Michela and Charles in the Azure Stack team for their help in getting to the bottom of this.
Hope this helps someone out there!
Thank you for reading my blog.
If you have any questions or feedback, please leave a comment.