Hi all,
I'm currently trying to use splunk to create an alert for the following scenario:
I have a search that tell's me the number os rows and partitions a data pipeline ingested, so basically i already extract the following fields:
- Table Name
- Number of partitions
- Number of rows
I also have a dashboard that shows me the timechart of the number of partitions and rows across different executions during the time.
What i need in this example, is to have an alert that get triggered when the number of the partitions or rows have more than a specified % of difference between executions. So in this example, the executions 1 and 2 have a low difference between then, but the execution 3 is clearly an outlier, that should be alerted.
Execution 1:
table_name = table_1
num_part = 12
num_rows = 1400
Execution 2:
table_name = table_1
num_part = 10
num_rows = 1000
Execution 3:
table_name = table_1
num_part = 10000
num_rows = 100000000
Any sugestions on how i can do it?
Hi all, thank you for the tips.
I'm trying the following approach, i created a dataset with the global statistics of each table.
Now, i'm trying to join the results of my search with the results of my dataset where the column "Table" is the same, for i can create a column "IsOutlier" using a if statement reading my dataset.
I wanted to do something like this:
... my search that returns table_name and number_rows | eval isOutlier=if(number_rows < mydataset.lowerBound where mydataset.table = "table_name" OR number_rows > mydataset.upperBound where mydataset.table = table_name, 1, 0)
What's the right way to write such a statement on a Splunk Search?
If your search executions are distinct runs of the search where you only have access to the existing run data, then you will need to do either
That would raise some issues you'd have to deal with, i.e. in you execution 4, when it goes back to 'normal' if you have just saved 10000, then there will again be a variance, but that can be dealt with.
Its possible,
Unless you have Splunk MLTK you might have to do some statistics 🙂
Take a look at this article that talks about how to identify the an outlier and then use IQR to identify outliers.
https://docs.splunk.com/Documentation/Splunk/8.2.2/Search/Findingandremovingoutliers