By Garv Sachdeva and Paul Keely
Time to complete this task 30 min, less if you already have an Azure Automation Account and BLOB storage.
We were asked to build a disk failure prediction Machine Learning (ML) model for a client that had 13000 clients with the Log Analytics (LA) agent on them. We had a number of performance metrics in LA that we wanted to export to BLOB storage so that we could get the data into DataBricks and start to build a ML model on. Thankfully Azure Log Analytics supports API for both ingestion and egestion with Rest. We can frame a query with timespan in the message and trigger it on a POST call to log analytics endpoint and render results in response. Use the link to learn more about log analytics REST API support, https://dev.loganalytics.io/documentation/Using-the-API
So, we are just calling an API to export data, and you can accomplish this is so many ways why are we choosing PowerShell and not a logic app or a Jupyter Notebook? Well after a lot of testing we have found that for very long running exports of large amounts of data that the PS script gave us the most reliable outcome. The script we provide for this output is pretty generic and we usually customize it more. The script is on TechNet https://gallery.technet.microsoft.com/Exporting-Log-Analytics-d025b5e7
In the image below you can see 2 simple images.
- Following this blog, you will be able to export any data from LA to BLOB.
- You need 3 prerequisites for this to work
- Azure service principal account with Log analytics contributor role at subscription level in Identity and Access Management (IAM).
- Storage account on the standard tier
- Azure Automation (AA) environment with PS version 5*
- The PowerShell script downloaded from TechNet
*We are using AA for this blog because it’s the easiest way to demo it for the blog. If you want to export very large datasets where the script may take hours, then we would use a Hybrid Worker role for this. The Hybrid Worker is beyond the scope of this blog but if you are doing this task you should be able to work out the Hybrid Worker too, if you can’t you can ask us for help.
We are going to walk you through the prerequisite setup. If you already have a service principle, AA account and storage blobs there is no problem using existing artefacts, just please make sure you read through this section to understand the permissions etc you many need to grant.
Prerequisite 1 Service Principle Account creation (App Registration is the new term)
Navigate to the Azure AD feature plane in Azure, when in AAD go to App Registrations
On the top select “New registration
Fill in the details as show and hit register.
When the principle is created, it will open the screen below and you need to go to API permissions
Select “Add a permission”
In the “Request API Permissions page select the second option “APIs my organization user” and then just type “Log”. Log Analytics API will appear as is show below.
Then chose the second option “Application permissions”
There is only one permission you can grant “Data Read”, select that as is show in the image below.
Select Add Permission at the bottom at that screen.
You now need to grant permission to this new Service Principle
When you select the grant option you will be presented with a popup and select yes.
You should get a change in screen that looks like this.
Later in the process we will ask for the following 4 items, and the first two are available to you now, so open notepad and write down the first 2. For step 3 and 4 please get those values now so that you have that ready for the follow on steps.
- Application (client) ID (AAD > App registrations > SPN > Overview)
- Service Principle Key (AAD > App registrations > SPN > Certificates & secrets)
- Tenant ID (AAD > Properties > Directory ID)
- Subscription ID (Subscriptions > Subscription ID)
When you go to the overview screen of the account you just created you will see the Application ID, you need to copy that.
Then we need to create a client secret that can be used to authenticate the Azure Automation account as this account.
Go to “Certificates and secrets”
Select “+ New client secret”
Enter the name (not important) and expiration Copy the value of the secret key after creation as it will not be displayed again after you close this window. This is item #2 required to copy from earlier in this document (Service Principal Key)
Copy the value of the secret key after creation as it will not be displayed again after you close this window. This is item #2 required to copy from earlier in this document (Service Principal Key)
Now, go to the “Subscription” feature blade and search for the subscription with the LA workspace.
In the overview section go to Access control (IAM) and select Add and then role assignment.
You need to select the Log Analytics Reader Role and add the Service Principle you just created.
Prerequisite 2 Add the BLOB Storage
The purpose of this process is to export data from LA to BLOB storage so that we can then work with it in Databricks, so we need a BLOB storage setup prior to the export. In this step we will build a new storage account and create a new blob store in that account. There is no problem using an existing account, but it MUST be STANDARD tier, not premium.
In the search bar in the top middle of the screen of the Azure Portal, type “Storage” and then select Storage accounts
Click +Add and you will be presented with the “Create storage account” page. Complete the fields as required and ensure “standard” performance tier is selected
The storage account could take approximately 3 minutes to deploy. After the storage account has been created, go to the storage account Settings\Access keys and copy your complete connection string.
Prerequisite 3 – Setup Azure Automation
For prerequisite 3 we are going to setup Azure Automation, this assumes that you do not have an automation account or want to setup a new one. An existing Automation Account can also be used
In the search bar at top of the Azure Portal, type “Automation” and click Automation Accounts
Click +Add and complete the form requirements. Deploying the account may take 2-3 minutes
When the AA account is created, we need to add two modules: Az.Accounts and Az.Storage. In the AA account, under Shared Resources\Modules gallery search for and select Az.Accounts and import it.
This can take a few min to import, so please make sure it has imported before getting Az.Storage as this is a prerequisite module
Next in the gallery, search for Az.Storage and select it and import it
There are 4 variables we need to configure in the Automation Account. We usually set them up as variables here so that we don’t have to consider them again. You can add them all as encrypted values its up to you. The SPN_Json holds 4 values that are best encrypted.
In the Automation Account, under Shared Resources section, select “Variables”
Click +Add a variable and follow the details below for each of the 4 variables
There are several input parameters required by the script to traverse through the whole flow seamlessly, let’s talk about each one of them in detail. The script is looking for these values exactly as show here. If you enter your own values you are going to have to change those is the script, so its just easier to enter then as show here.
- KustoQueries, this is the actual data you want to export. Go to your LA workspace and run the query you want to export and make sure there is data there before you enter it into the variable. General recommendation around framing queries is to make sure to project only selective features that make sense to be exported from a dataset, and not to export the complete list of columns.
For instance, exporting data from SecurityEvent dataset in LA might take you hours to export data for a week if you simply put
Now with limited features in query below I would have my data exported in no time,
SecurityEvent | where EventID == 4688 | project TimeGenerated , Computer , EventID , ParentProcessName, ProcessId, NewProcessName , NewProcessId
You need to enter the LogAnalyticsWorkspaceId, navigate to your LA workspace and in the “Overview” session you can see your workspace ID. Copy this and enter it into the variable as show below.
Here are the values
- SPN_Json this variable needs to contain the following 4 values, for obvious reasons we are not providing a screen shot, so just fill in the values and add the variable.
You need to enter the values like you see below.
“ServicePrincipalKey”: ” Put_Your_Saved_Value_Here “,
“TenantId”: ” Put_Your_Saved_Value_Here “,
“SubscriptionId”: ” Put_Your_Saved_Value_Here “
- Blobstore – this is the connection string to your blob storage account.
This is what your variables should look like
Next, we add the PS script
Still in the Automation Account, navigate to Process Automation\Runbooks and click +Create a runbook
Enter the details as shown below
In the Overview tab, select Edit and then Copy the script
Save and Publish
Back on the main screen, you can test the runbook by starting it.
When you run the script, it will pop up with a parameter file, we have created the variables we need for the script to run so you can ignore this. In the screen shot below we have chosen a time series of one hour (that has already passed😊). The reason we have picked an hour is that we want to just test everything and make sure it works. If your query returns a lot of data, and you leave the time the default (7 days) it could take a long time to run.
When the script runs go to the output window and you should see that it is telling you the total number of rows returned, and the total number of files created.
Navigate to your newly created storage account and select Blobs under the Blob service section and click “log-export”
In the BLOB, I can see the events and I can download them to view.
I have 5000 rows in my excel and it took about 10 seconds for the script to run.
- Log analytics API peaks at throttle limit of 61MBs or 500K rows, so we would need to index the records, sequentially framing the queries on record’s row number and calling API in a loop till required time range data is imported.
- We usually heavily customise the script for different requirements, and usually run it from a Hybrid Worker role in Azure. The Hybrid Worker with a spec of 4vCPU and 16GB of RAM does just fine for most workloads.
- If you want to take this further and get stuck, you can contact BIT-C and we can help you as a consulting engagement. email@example.com