All Apps and Add-ons

Statistical anomalies searches

jwalzerpitt
Motivator

I am running two different searches (Last 7 days) to determine the statistical anomalies for HTTP POSTS within 1 minute. basically looking for any potential anomalous posts that are making past the WAF. The first search uses ML and is as follows:

index=foo
| bucket _time span=1m
| stats count by _time src
| eventstats avg("count") as avg stdev("count") as stdev by "src"
| eval lowerBound=(avg-stdev*exact(2)), upperBound=(avg+stdev*exact(2))
| eval isOutlier=if('count' < lowerBound OR 'count' > upperBound, 1, 0) | splitby("src")
| fields _time, src, "count", lowerBound, upperBound, isOutlier
| where isOutlier=1

The second search is looking for Z scores:

index=foo
| bucket _time span=1m
| stats count by _time src website
| eventstats mean("count") AS mean_count, stdev("count") AS stdev_count
| eval Z_score=round(((count-mean_count)/stdev_count),2)
| where Z_score>1.5 OR Z_score<-1.5
| table _time, src, website, count, mean_count, Z_score
| sort -Z_score

Comparing the results of both searches returns different external IPs. Out of a total of 111 distinct IPs, only 12 IPs overlap between the two searches.

The first question I have is are both searches an apple to apple comparison? The second question I have is, is one search more valid than the other, or is running both searches and looking for IPs that overlap a more concise way to evaluate the statistical anomalies for HTTP posts by external IPs?

Thx

0 Karma

skoelpin
SplunkTrust
SplunkTrust

You have the approach mostly correct but I see a few issues.

First, you need to determine what explanatory variables need to be fed into your target function. If your data follows a cyclic type pattern then most likely _time will be your strongest explanatory variable. Assuming your data is cyclic, you will need to establish a baseline over certain time periods then calculate your boundaries.

Next, you're going to want to make this scalable, so running sub-searches is out of the question. A better approach would be to feed the data into a summary index so you have a 1 day baseline in advance and you can then run a 5-10 minute populating search which will overlay that baseline. This can then trigger visual and email alerts anytime your actual values fall out of "normal"

0 Karma

Anam
Community Manager
Community Manager

@jwalzerpitt

My name is Anam Siddique and I am the Community Content Specialist for Splunk Answers. Please accept the answer if the solution provided by @skoelpin worked for you. We have awesome users who contribute and it would be great if the community can benefit from their answer plus they can get credit/points for their work!

Thanks

Get Updates on the Splunk Community!

Last Chance to Submit Your Paper For BSides Splunk - Deadline is August 12th!

Hello everyone! Don't wait to submit - The deadline is August 12th! We have truly missed the community so ...

Ready, Set, SOAR: How Utility Apps Can Up Level Your Playbooks!

 WATCH NOW Powering your capabilities has never been so easy with ready-made Splunk® SOAR Utility Apps. Parse ...

DevSecOps: Why You Should Care and How To Get Started

 WATCH NOW In this Tech Talk we will talk about what people mean by DevSecOps and deep dive into the different ...