All Apps and Add-ons

Detecting numerical outliers from time series data for multiple devices

kiril123
Path Finder

I have written the following search to detect numeric outliers (based on the syslog message count per day) for the "test-device"

index = syslog hostname = "test-device" | regex message = (?i)".fail."
| timechart span=1d count | eventstats avg("count") as avg stdev("count") as stdev | eval upperBound=(avg+stdev*exact(1)) | eval isOutlier=if('count' > upperBound, 1, 0)

Is it possible to extend this search to include multiple devices?

So i can get something like this:

Device_name, is_outlier_time_1, is_outlier_time_2, is_outlier_time_3
test-device_1, 0, 0, 1
test_device_2, 1, 0, 1
test_device_3, 0, 0, 0

0 Karma

woodcock
Esteemed Legend

Try this:

index = syslog
| regex message = (?i)".fail."
| timechart span=1d count BY hostname
| eventstats avg("count") as avg stdev("count") as stdev BY hostname
| eval upperBound=(avg+stdev*exact(1))
| eval isOutlier=if('count' > upperBound, 1, 0)
0 Karma

aoliner_splunk
Splunk Employee
Splunk Employee

Please take a look at the Detect Numeric Outliers assistant, try the "Fields to split by" option, and click on the Show SPL buttons. You'll see that eventstats and streamstats both accept a 'by' clause and that the Toolkit includes a 'splitby' macro that you may find helpful. For example, using one of the built-in datasets, you can get pretty close to what I think you're looking for:

| inputlookup hostperf.csv 
| eval _time=strptime(_time, "%Y-%m-%dT%H:%M:%S.%3Q%z") 
| timechart span=10m max(rtmax) as responsetime 
| head 1000 
| eval host=random()%3 
| streamstats window=200 current=true median("responsetime") as median by "host" 
| eval absDev=(abs('responsetime'-median)) 
| streamstats window=200 current=true median(absDev) as medianAbsDev by "host" 
| eval lowerBound=(median-medianAbsDev*exact(20)), upperBound=(median+medianAbsDev*exact(20)) 
| eval isOutlier=if('responsetime' < lowerBound OR 'responsetime' > upperBound, 1, 0) 
| `splitby("host")` 
| fields _time, "responsetime", lowerBound, upperBound, isOutlier, *
0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...