How to create alert when a value is an outlier

nochimows · ‎10-07-2021

Hi all,

I'm currently trying to use splunk to create an alert for the following scenario:

I have a search that tell's me the number os rows and partitions a data pipeline ingested, so basically i already extract the following fields:
- Table Name
- Number of partitions
- Number of rows

I also have a dashboard that shows me the timechart of the number of partitions and rows across different executions during the time.

What i need in this example, is to have an alert that get triggered when the number of the partitions or rows have more than a specified % of difference between executions. So in this example, the executions 1 and 2 have a low difference between then, but the execution 3 is clearly an outlier, that should be alerted.

Execution 1:
table_name = table_1
num_part = 12
num_rows = 1400

Execution 2:
table_name = table_1
num_part = 10
num_rows = 1000

Execution 3:
table_name = table_1
num_part = 10000
num_rows = 100000000

Any sugestions on how i can do it?

nochimows · ‎10-13-2021

Hi all, thank you for the tips.

I'm trying the following approach, i created a dataset with the global statistics of each table.
Now, i'm trying to join the results of my search with the results of my dataset where the column "Table" is the same, for i can create a column "IsOutlier" using a if statement reading my dataset.

I wanted to do something like this:

... my search that returns table_name and number_rows | eval isOutlier=if(number_rows < mydataset.lowerBound where mydataset.table = "table_name" OR number_rows > mydataset.upperBound where mydataset.table = table_name, 1, 0)

What's the right way to write such a statement on a Splunk Search?

bowesmana · ‎10-10-2021

If your search executions are distinct runs of the search where you only have access to the existing run data, then you will need to do either

Change that search so that it calculates 2 (or more) ranges of the result set, i.e. execution 2 calculates the results for execution 1 and 2 and then you will have the data to compare changes. OR
Change the existing search so that it
- Having calculated the values, lookup the table name from a new lookup file that contains table_name, partitions, rows - assuming you have multiple tables. You can calculate the variance. You can then save the latest results back to the lookup table for the next iteration.

That would raise some issues you'd have to deal with, i.e. in you execution 4, when it goes back to 'normal' if you have just saved 10000, then there will again be a variance, but that can be dealt with.

Stefanie · ‎10-08-2021

Its possible,

Unless you have Splunk MLTK you might have to do some statistics 🙂

Take a look at this article that talks about how to identify the an outlier and then use IQR to identify outliers.

https://docs.splunk.com/Documentation/Splunk/8.2.2/Search/Findingandremovingoutliers

How to create alert when a value is an outlier

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

From Data to Insight: Announcing the Winners of the Splunk Dashboard Contest

Splunk Developers: Construct Your Future at the .conf26 Builder Bar

Quick connection discovery mode for forwarders

Join the Conversation

How to create alert when a value is an outlier

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

From Data to Insight: Announcing the Winners of the Splunk Dashboard Contest

Splunk Developers: Construct Your Future at the .conf26 Builder Bar

Quick connection discovery mode for forwarders