Hi Team,
I have a field "duration". There are lot of APIs for which this field is populated
can i use the Detect outliers to find out out high duration instances in the last 7 days based on API name.
As i am new to this. i don't know how to start. can someone help?
can i do something like this
sourcetype="something"
| fields duration, operation
| bin _time span=5m
| stats avg(duration) as avg_duration by operation, _time
| streamstats window=0 avg(avg_duration) as mean_duration stdev(avg_duration) as stdev_duration by identity_operation
| eval normalized_duration = (avg_duration - mean_duration) / stdev_duration
| eventstats avg(normalized_duration) as avg_normalized_duration by operation
| eval normalized_duration = (normalized_duration - avg_normalized_duration) / stdev(avg_normalized_duration)
| fit IsolationForestModel normalized_duration by operation
| eval anomaly = if(predicted_IsolationForestModel = 1, 0, 1)
| table identity_operation, _time, avg_duration, anomaly
as stdev is not supported in splunk what can i use instead?
stdev is supported as an aggregation function to stats commands not a function to eval command
MLTK detects anomalies by looking at deviations from a model of "expected" behaviour, so the first thing you need to do is create a search which returns what you decide is normal behaviour. In your case, this could be average duration per API per hour/day/whatever time period is appropriate for your data. The tricky bit is that you may need to create a model for each API (which isn't ideal). A workaround to this is to normalise the data in some way.