How to analyze Microsoft Sentinel Daily Cap Alerts – AADNonInteractiveUserSignInLogs

To avoid unplanned costs for Microsoft Sentinel, it is recommended to set a daily cap and create an analytics rule that triggers an alert when the daily cap is reached. Microsoft has published general guidance for monitoring costs here

In the past months I have deployed a number of Microsoft Sentinel instances and in many cases the root cause for reaching the daily cap was related to data ingested into the AADNonInteractiveUserSignInLogs table. When analyzing the data we often found an individual user that created an unusual high amount of events. This can happen for various reasons such as:

  • The user is still logged on to a device, but has changed their password on another device
  • The user has left the company , but is still logged on to some virtual desktops
  • The user account is disabled, but the user is still logged on somewhere
  • The user has left the company, his account is deactivated, but their mobile phone is still trying to pull e-mails

Okay, let’s start at the beginning

Data Cap

To avoid a bill shock, we set a daily cap.


Analytics Rule

If we want to get alerted, we can setup an analytics rule within Microsoft Sentinel as shown in the example below.


The Alert

Whit the analytics rule in place, we get an alert as shown below when the daily data cap is reached.


Analyzing the Data Usage

Now that we have an alert , we have to investigate, what caused the high data volume. Logon to the Azure Portal and navigate to the Usage and estimated costs blade within the Microsoft Sentinel Log Analytics Workspace. Here we can already identify what Solution caused the data ingestion increase, Select the Open chart in analytics button


Log Analytics is opened with a predefined query that shows the usage. Here we see that LogManagement had an increase in data ingestion. Remove the start date and set the time range to 24 hours.

Usage

| where IsBillable == true

| summarize TotalVolumeGB = sum(Quantity) / 1000 by bin(StartTime, 1d), Solution

| render columnchart


Change the query to display DataType instead of Solution, then re-run the query

Usage

| where IsBillable == true

| summarize TotalVolumeGB = sum(Quantity) / 1000 by bin(StartTime, 1d), DataType

| render columnchart


Next remove the | render instruction from the query to see the details

Usage

| where IsBillable == true

| summarize TotalVolumeGB = sum(Quantity) / 1000 by bin(StartTime, 1d), DataType

Now let’s find the user(s) that cause the high event volume.

AADNonInteractiveUserSignInLogs

| summarize count() by UserPrincipalName


Next we drill down into the events just for the user that triggers the most events.

AADNonInteractiveUserSignInLogs

| where UserPrincipalName == “john.doe@foocorp.com”

| summarize count() by UserPrincipalName, ClientAppUsed, AppDisplayName


Here we see that we have a lot of Windows Sign in events. Next lets drill into the details to identify the device.

AADNonInteractiveUserSignInLogs

| where UserPrincipalName == “john.doe@foocorp.com”

| where AppDisplayName == “Windows Sign In”

| extend DeviceName = tostring(parse_json(DeviceDetail).displayName)

| extend trustType = tostring(parse_json(DeviceDetail).trustType)

| extend deviceId_ = tostring(parse_json(DeviceDetail).deviceId)

| extend operatingSystem = tostring(parse_json(DeviceDetail).operatingSystem)


Next let’s see how many devices are involved and add the following KQL line.

| summarize count() by DeviceName


That’s it for today, I hope you found this useful. I’m currently working on an early detection when logs start to unusually grow, this so that IT operations or Security teams can take an immediate action and prevent the daily cap being reached.

Bye

Alex


Leave a Reply