Getting Data In

How to create alert when a value is an outlier

nochimows
Engager

Hi all,

I'm currently trying to use splunk to create an alert for the following scenario:

I have a search that tell's me the number os rows and partitions a data pipeline ingested, so basically i already extract the following fields:
- Table Name
- Number of partitions
- Number of rows

I also have a dashboard that shows me the timechart of the number of partitions and rows across different executions during the time.

What i need in this example, is to have an alert that get triggered when the number of the partitions or rows have more than a specified % of difference between executions. So in this example, the executions 1 and 2 have a low difference between then, but the execution 3 is clearly an outlier, that should be alerted. 

Execution 1: 
table_name = table_1
num_part     = 12
num_rows   = 1400

Execution 2:
table_name = table_1
num_part = 10
num_rows = 1000

Execution 3:
table_name = table_1
num_part = 10000
num_rows = 100000000 

 

Any sugestions on how i can do it?

Labels (1)
0 Karma

nochimows
Engager

Hi all, thank you for the tips.

I'm trying the following approach, i created a dataset with the global statistics of each table. 
Now, i'm trying to join the results of my search with the results of my dataset where the column "Table" is the same, for i can create a column "IsOutlier" using a if statement reading my dataset.

I wanted to do something like this:

... my search that returns table_name and number_rows | eval isOutlier=if(number_rows < mydataset.lowerBound where mydataset.table = "table_name" OR number_rows > mydataset.upperBound where mydataset.table = table_name, 1, 0)

 

What's the right way to write such a statement on a Splunk Search?

0 Karma

bowesmana
Super Champion

If your search executions are distinct runs of the search where you only have access to the existing run data, then you will need to do either

  • Change that search so that it calculates 2 (or more) ranges of the result set, i.e. execution 2 calculates the results for execution 1 and 2 and then you will have the data to compare changes. OR
  • Change the existing search so that it
    • Having calculated the values, lookup the table name from a new lookup file that contains table_name, partitions, rows - assuming you have multiple tables. You can calculate the variance. You can then save the latest results back to the lookup table for the next iteration.

That would raise some issues you'd have to deal with, i.e. in you execution 4, when it goes back to 'normal' if you have just saved 10000, then there will again be a variance, but that can be dealt with.

 

Stefanie
Communicator

Its possible,

Unless you have Splunk MLTK you might have to do some statistics 🙂

Take a look at this article that talks about how to identify the an outlier and then use IQR to identify outliers. 

https://docs.splunk.com/Documentation/Splunk/8.2.2/Search/Findingandremovingoutliers 

.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!