Azure Data Explorer (ADX) and Azure Blob storage have long been recommended as long-term archival solutions for Sentinel. Microsoft recommended using ADX and Blob storage for long-term retention, and the good news is that Azure Monitor lets you query data externally from your Log Analytics workspace for a full SOC experience.
One question we are frequently asked is whether it is possible to query data sets stored in Azure Blob storage and from ADX interactively from Log Analytics and create analytic rules and hunting queries. The short answer is, YES!
In this comprehensive guide, we will show you how to create and build analytic rules to query external data in Microsoft Sentinel from Azure Blob storage.
In This Article
Microsoft Sentinel lets you store data for long-term retention using the new archival tier. The archival tier lets you store that data in an archival or kind of a cold backup that lets you restore that data when you need it. You can keep it to meet your regulatory requirement standards to store that data for a longer period (up to 7 years or 12 years) at a much cheaper cost point to lower the retention cost while still meeting organizational or regulatory requirement standards.
The archive option is straightforward to set up inside your Log Analytics workspace, and you can set different standards for each table. You might have anywhere from 80 to over 100 tables in a typical Sentinel environment, you can go into each of those tables and adjust that standard as needed for up to seven or twelve years combined as shown in the figure below.
The cost of the archival tier is quite affordable for most customers that are using it. The downside of this native solution is that even though you can’t have two years of interactive or hot retention inside of Sentinel, most customers can’t afford more than one year, because that data set gets bigger and bigger month over month. After about the one-year mark, it just starts to become prohibitively expensive. So even though, Sentinel does support two years of hot retention, and for most customers that we work with is one year or less, somewhere between 90 days and one year.
To access the archive tier, you have to do a restoration job or a saved search, and those search and restore jobs cost money, they are NOT free. And if they’re done poorly, they can be expensive because you pay per query (KQL) base search.
Several reasons make Azure Data Explorer (ADX) and Azure Blob Storage a great alternative for long-term retention solutions. For example, if you want hot tier retention for more than two years, or if you want hot tier retention by doing active KQL queries for more than the free 90 days if you have a massive data set, or if you need more than seven or twelve years of archival.
To follow this article, you need to have the following:
1) Azure subscription — If you don’t have an Azure subscription, you can create a free one here.
2) Log Analytics workspace — To create a new workspace, follow the instructions to create a Log Analytics workspace. A Microsoft Sentinel workspace is required to ingest data into Log Analytics. You also must have read and write permissions on this workspace, you need at least contributor rights.
3) Microsoft Sentinel — To enable Microsoft Sentinel at no additional cost on an Azure Monitor Log Analytics workspace for the first 31 days, follow the instructions here. Once Microsoft Sentinel is enabled on your workspace, every GB of data ingested into the workspace can be retained at no charge (free) for the first 90 days.
4) Storage account — To create a new storage account, follow the instructions to create a storage account.
5) Export and store data in Azure Blob storage. Various methods can be used to export and store data into Blob storage. The diagram below shows some of these methods:
- You could natively export data from Log Analytics to a storage account, keep in mind that Log Analytics workspace supports 10 enabled data export rules, and each can include multiple tables. However, not all tables are supported in data export currently, see the list of limitations.
- Or, you could deploy a logic app that exports and sends the data directly from Log Analytics to a storage account, see the example on how to export data from a Log Analytics workspace to a storage account by using Logic Apps.
- Or, you could send directly from your data sources into Sentinel and storage account at the same time.
- Another interesting option you could consider is sending certain data sets directly to storage accounts, completely bypassing Sentinel. For example, if you have large data sets for Microsoft Entra ID (formerly Azure Active Directory) sign-in logs or the back-end logs from the Microsoft Defender XDR portal that have already been analyzed by that particular security tool, it’s already been alerted on, the back-end data is more an archival and maybe a hunting data set.
As you can see, we have various ways that you can use to export and store data in a storage account. In this article, we won’t cover how to export data to a storage account but rather focus on how to query the archived data stored in a storage account and create hunting and analytics rules.
Assuming you have all the prerequisites in place, take the following steps:
Storage Account Authentication Methods
There are a couple of ways that you can use to query external data in Microsoft Sentinel from Blob Storage. We will start with the most secure option to the least one, and we highly and strongly recommend using the secure method.
To query external data, we need to use the “
externaldata” operator that returns a table whose schema is defined in the query itself (more on this below), and whose data is read from an external storage artifact, such as a blob in Azure Blob Storage or a file in Azure Data Lake Storage. Furthermore, the “
externaldata” operator supports the Shared Access Signature (SAS) key, Access key, and Microsoft Entra Token authentication methods.
Before we build our KQL query, we need to look at the storage authentication methods first. So, let’s discuss the storage account authentication methods that we have at our disposal to query external data from Log Analytics:
- Access Key: We can use the storage account Access Keys to authenticate, however, authorization with storage account access keys (Shared Key) is not recommended as it’s less secure. You can use this authentication method when you need to query external data from Sentinel / Log Analytics on an ongoing basis, but again we don’t recommend it.
- Shared access signature (SAS): A shared access signature (SAS) provides secure delegated access to resources in your storage account. With SAS, we have granular control over how the SOC can access the data. For example, we can define what resources, what permissions, what public IPs, and for how long the SAS is valid.
- Microsoft Entra Token: This is the most secure option and highly recommended over the previous two options because we can disable storage account key access and still have granular control over how the SOC team can access the data, so we can prevent Shared Key authorization for our Azure Storage account.
Please note that in all the authentication methods listed above, we cannot query the storage account privately using a Private Endpoint or with a Firewall enabled because the “
externaldata” operator is NOT supported when the public endpoint of the storage artifact is behind a firewall. Currently, the storage account networks need to be publicly accessible.
Related: Azure Storage Essential Training.
Let’s see how to query external data from Log Analytics using Microsoft Entra Token.
Microsoft Entra Access Token
In this section, we will guide you through the steps to create a Microsoft Entra Access Token. It’s important to note that these tokens have a limited lifespan, meaning they can only be used for a certain amount of time. They are beneficial for scenarios where you need to access your data in a storage account for purposes such as compliance, audit, and verification.
Create Microsoft Entra App Registration
Open your Microsoft Entra admin center portal. Under Identity > Select App registrations from the left menu.
If you have an existing App registration you can use, select it. Otherwise, create a new one.
Click + New registration to create an App registration. Name it, leave the default settings, and click Register. In this example, we’ve created a new application called “Query External Data in Azure Blob Storage“.
Go to the newly registered application:
- Click on Overview and collect the Application (client) ID value and the Directory (tenant) ID value.
- Click on Certificates & secrets and create a New Client Secret. Take note of the client’s secret value. (Note that the password value will not be available once you leave the page).
Assign Storage Blob Data Reader
Next, go to your storage account > Access Control (IAM) and assign to this App the RBAC role required to call any data access operation in Azure Storage. Please note the role assignment could take some time to take effect.
For the scenario presented here (query external data in Microsoft Sentinel from Azure Blob storage), we need to assign to the application the following least privilege permission “
Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read“. The Storage Blob Data Reader RBAC role is the least privileged built-in role with this permission.
Get a New Access Token
To get a new Microsoft Entra Access Token, we need to use Postman, if you don’t have Postman installed, you can download it from this page, and then follow the steps below.
As a side note, we could also use the Azure CLI to generate a Microsoft Entra Token by using the “
az account get-access-token” command:
az account get-access-token --scope "api://app-registration-client-id/Scope-Name" --resource "https://storage.azure.com/" --query accessToken -o tsv
You need to add the Microsoft Azure CLI Client ID “
04b07795-8ddb-461a-bbee-02f9e1bf7b46” under the “Authorized client applications” to bypass user/admin consent as shown in the figure below. If you didn’t configure the “Authorized client applications” for the app registration, then you must run “az login” with the scope argument first “
az login --scope api://app-registration-client-id/Scope-Name“, and then use the “
az account get-access-token” command above.
To use Postman take the following steps:
1) Create a new request, and select the Authorization tab in the request builder window.
2) In the “Type” dropdown, select “OAuth 2.0”
3) On the right side, please fill in the following fields:
- Token Name: A name of your choosing
- Grant Type: Client Credentials
- Access Token URL: “
https://login.microsoftonline.com/<tenant-id>/oauth2/v2.0/token” where <tenant-id> is the Directory (tenant) ID value collected from the newly registered app above.
- Client ID: The Application (client) ID value collected above.
- Client Secret: The Client secret value collected above.
- Scope: For storage, enter “
https://storage.azure.com/.default“. This is important, the token must be for the resource “
- Client Authentication: set to Send as Basic Auth Header.
4) Last click on “Get New Access Token” and collect/copy the Access Token as shown in the figure below, as you will need them when you query the data via Log Analytics. The “Access Token” has been intentionally obscured for obvious reasons.
The other authentication method that you can use is a Shared access signature (SAS).
Shared Access Signature
As mentioned previously, shared access signature (SAS) provides secure delegated access to blobs in your storage account. With SAS, we have granular control over how the SOC can access our data. For example, we can define what resources, what permissions, what public IPs, and for how long the SAS is valid. SAS tokens have an expiration time longer than Microsoft Entra Access Token, so we can use this authentication method when querying data for a limited or extended time.
We can create the secret token that we will need to query the blob contents. To do this, go to the desired container, in this example, we are using the “
am-commonsecuritylog” and choose “Shared access tokens” on the left menu as shown in the figure below. Please note that you can also be more granular and generate an access token for a specific blob or file and not for the entire container.
On the Shared Access tokens page, we can specify what level of permissions are allowed (read for this example), the start and expiration time for the SAS and we can specify the allowed protocols (HTTPS in this example), and the signing key. Click “Generate SAS token and URL“.
Note: You can automate all these steps using PowerShell, Azure CLI, or the REST API.
Next, copy the “Blob SAS token” in case you generate a SAS token for the entire container, or copy the “Blob SAS URL” in case you generate a SAS token for a specific blob, as you will need them when you query the data via Log Analytics.
If you generated a SAS token for the entire container, then you need to go to the desired Blob inside the container whether it’s a JSON or CSV file, and then copy the full URL as shown in the figure below, as you will need it when you query the data via Log Analytics.
Now that we’ve generated a Microsoft Entra Token and Shared Access Signature (SAS) Token, let’s develop and test the KQL query to access the contents of the exported JSON file in Blob storage.
Query External Data in Microsoft Sentinel
The next step is to develop and test the query to access the file content that we are exporting into Blob storage as long-term retention. Our sample query is below with the two pieces of data required from the previous sections as part of the query.
As mentioned previously, the “
externaldata“operator returns a table whose schema is defined in the query itself. In the example below, we have defined the “
TimeGenerated“, and “
Type“, you can adjust based on your collected data.
Next, we need to add the full Blob storage URL and append the secret token to the file. Now based on the authentication method that you are using, the syntax will slightly differ, choose the one that you prefer to use:
# For SAS Token
# The Token does not include the "?" at the start which is required
# For Microsoft Entra Token
# The Token does not include the ";token=" at the start which is required
Then you need to specify whether the Blob is in “JSON” or “CSV” format.
# for JSON
# for CSV
# If your CSV file include any header(s)
# Then you can use the "ignoreFirstRecord" property
Below is a subset of the results of the final query with the SAS token pulling back data from the contents of the JSON file in the Blob storage. The “SAS Token” has been intentionally obscured for obvious reasons.
Similarly, below we can see the same results of the final query with the Microsoft Entra token pulling back data from the contents of the JSON file in the Blob storage. The “Microsoft Entra Token” has been intentionally obscured for obvious reasons.
Create Analytic Rules
We can build on top of the above KQL query and create a new Analytic rule to look for internal data in log analytics and external data in Azure Blob Storage, so you can generate alerts and incidents.
In this example, we will create an Analytic Rule to query the “
CommonSecurityLog” table in Log Analytics and the external data in Blob Storage. This query retrieves data from an external source, enhances “
CommonSecurityLog” data with geographic information, and then filters the results to include only entries related to China based on the “
country_name” field. It’s used for analyzing security logs with a focus on network activities originating from China.
let geoData = externaldata
| extend ipv4_is_private = ipv4_is_private(DeviceAddress)
| where ipv4_is_private != "true"
| evaluate ipv4_lookup (geoData, DeviceAddress, network)
| project Subnet=network, country_name, DeviceAddress
| where country_name == "China"
Please note that if the SAS token or the Microsoft Entra token is expired, then the Analytic rule will be automatically set to “AUTO DISABLED” by Microsoft Sentinel as shown in the figure below. The alert rule was disabled due to too many consecutive failures.
By design, Microsoft Sentinel stops executing a rule if a predetermined number of consecutive permanent failures of the same type occur, and takes the following steps:
- Disables the rule.
- Adds the words “AUTO DISABLED” to the beginning of the rule’s name.
- Adds the reason for the failure (and the disabling) to the rule’s description.
You can easily determine the presence of any auto-disabled rules, by sorting the rule list by name. The auto-disabled rules will be at or near the top of the list. You can create an automation rule with conditions associated with the “When alert is created” trigger to notify SOC managers of the presence of auto-disabled rules instead of regularly checking the Analytics Rules list.
Create Hunting Queries
You can also create hunting queries to hunt across the data in the Log Analytics workspace and the exported (archived) data in Azure Blob Storage.
There you have it… Happy Querying External Data in Microsoft Sentinel from Azure Blob Storage!
ADX and Blob Storage Combined
You can combine Azure Data Explorer (ADX) and Azure Blob Storage which allows you to have the best of both worlds. With this option, you send the data to Azure Blob’s cheap storage, but you can still run KQL queries on the data almost as if you had it locally in the ADX cluster.
This approach uses the Log Analytics data export feature to send the data to Azure Blob Storage, but instead of querying the data from Log Analytics with the “
externaldata” KQL operator pointing to each blob as shown above, you create an external table in ADX that points to the whole container. This way you don’t have to reference each individual blob as this is managed transparently for you.
* Performance: Azure Blob Storage offers two performance tiers: Premium or Standard. Although both tiers are an option for long-term storage, Standard is generally chosen due to greater cost savings.
* Usability: Usability is the biggest concern when choosing Azure Blob as your long-term storage option. This is mainly because using “
externaldata” operator makes it very challenging when you have a big number of blobs to reference. With this approach, we eliminate that burden using external tables in ADX. The external table definition understands the folder structure in the blob storage account and allows us to query the data contained in many different blobs and folders transparently.
* Cost: In this architecture, ADX plays a very important role, but the good news is that the size of the cluster doesn’t matter because ADX only acts as a proxy. As you can see in the architecture diagram, we use the Log Analytics data export feature, so you will need to factor this into the cost.
In summary, there are several options available to keep your data accessible while reducing costs without using the native Archive tier.
In this article, we discussed all the integrated solutions that you can use with Microsoft Sentinel and Azure Blob storage compared to the built-in archiving solution in the Log Analytics workspace, and then we walked through all the necessary steps to query the archived data in a storage account and create hunting and analytics rules.
By using the “externaldata” piece of the Kusto query language, we can reach out and grab data from storage such as a file stored in a blob. The key to this was to first export into Azure Blob storage, then generate a Microsoft Entra Token or SAS token, and then use the Tokens (configured with the “
?” for SAS and “
;token=” for Microsoft Entra in front of the secret token) to query the data.
If you want to access the storage account without storage account key access, then you must use the Microsoft Entra Token authentication method. Finally, remember that the token does expire so you will need to generate the token occasionally or set the expiration date for a longer time.
Keep in mind that the use of the “
externaldata” operator is to retrieve small reference tables of up to 100 MB from an external storage artifact. The operator is not designed for large data volumes. If you want to retrieve large volumes of external data, we recommend ingesting the external data into Azure Data Explorer (ADX) or Log Analytics as custom logs. The “
externaldata” operator is not supported when the public endpoint of the storage account is behind a firewall.
Once you go through these steps a couple of times, and use some of the automated solutions shared in this article, it certainly makes it a lot easier, and less error-prone to query external data from Blob Storage and build these hunting and analytics rules.
Thank you for reading my blog.
If you have any questions or feedback, please leave a comment.