All Apps and Add-ons

Statistical anomalies searches

jwalzerpitt
Influencer

I am running two different searches (Last 7 days) to determine the statistical anomalies for HTTP POSTS within 1 minute. basically looking for any potential anomalous posts that are making past the WAF. The first search uses ML and is as follows:

index=foo
| bucket _time span=1m
| stats count by _time src
| eventstats avg("count") as avg stdev("count") as stdev by "src"
| eval lowerBound=(avg-stdev*exact(2)), upperBound=(avg+stdev*exact(2))
| eval isOutlier=if('count' < lowerBound OR 'count' > upperBound, 1, 0) | splitby("src")
| fields _time, src, "count", lowerBound, upperBound, isOutlier
| where isOutlier=1

The second search is looking for Z scores:

index=foo
| bucket _time span=1m
| stats count by _time src website
| eventstats mean("count") AS mean_count, stdev("count") AS stdev_count
| eval Z_score=round(((count-mean_count)/stdev_count),2)
| where Z_score>1.5 OR Z_score<-1.5
| table _time, src, website, count, mean_count, Z_score
| sort -Z_score

Comparing the results of both searches returns different external IPs. Out of a total of 111 distinct IPs, only 12 IPs overlap between the two searches.

The first question I have is are both searches an apple to apple comparison? The second question I have is, is one search more valid than the other, or is running both searches and looking for IPs that overlap a more concise way to evaluate the statistical anomalies for HTTP posts by external IPs?

Thx

0 Karma

skoelpin
SplunkTrust
SplunkTrust

You have the approach mostly correct but I see a few issues.

First, you need to determine what explanatory variables need to be fed into your target function. If your data follows a cyclic type pattern then most likely _time will be your strongest explanatory variable. Assuming your data is cyclic, you will need to establish a baseline over certain time periods then calculate your boundaries.

Next, you're going to want to make this scalable, so running sub-searches is out of the question. A better approach would be to feed the data into a summary index so you have a 1 day baseline in advance and you can then run a 5-10 minute populating search which will overlay that baseline. This can then trigger visual and email alerts anytime your actual values fall out of "normal"

0 Karma

Anam
Community Manager
Community Manager

@jwalzerpitt

My name is Anam Siddique and I am the Community Content Specialist for Splunk Answers. Please accept the answer if the solution provided by @skoelpin worked for you. We have awesome users who contribute and it would be great if the community can benefit from their answer plus they can get credit/points for their work!

Thanks

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...