Splunk Search

eval with mathematical calculations?

tfitzgerald15
Explorer

I'm trying to do something a little wonky here, so please bear with me. The code below is the logical flow of what I'm trying to accomplish. However I know for a fact it won't work. Can you give me some insight?

... | eval threshold=(outlier action=rm param=10 (stdev(count)))

Basically I'm trying to create a field, "threshold", which calculates the standard deviation of my results, removing any outliers prior to calculating the standard deviation. (Basically, I don't want that one result of 7,000 skewing my standard deviation upwards when normally it would have been, say, 10).

Tags (4)
0 Karma

jhupka
Path Finder

Is this what you're looking for...here's a search that does what I think you are asking for on indexer lag (_indextime-_time). So if this does what you're looking for you'll just need to modify to fit your search/data:

index=_internal | eval indexer_lag =_indextime - _time 
| eventstats p25(indexer_lag) as q1, p75(indexer_lag) as q3 | eval iqr=q3-q1 | eval threshold=10*iqr 
| where indexer_lag < threshold 
| eventstats stdev(indexer_lag) as threshold_stddev

I'll explain each part:

index=_internal | eval indexer_lag =_indextime - _time 

^^^ Calc our index lag for each event

 | eventstats p25(indexer_lag) as q1, p75(indexer_lag) as q3 | eval iqr=q3-q1 | eval threshold=10*iqr 

^^^ Now use eventstats to get our q1, q3 in-line with our events, then calc our interquartile range, and our threshold based on your choosing of 10*iqr from your original post.

| where indexer_lag < threshold 

^^^ Only keep events that have lag less than our threshold. e.g. remove our 10*iqr outliers

| eventstats stdev(indexer_lag) as threshold_stddev

^^^ Finally use eventstats again to calculate the new standard deviation in-line based on our new list of events.

jhupka
Path Finder

So you don't need the indexer_lag field, per se, but your overall search will be similar. If you're looking at specific sourcetype over all indexes, then your search may start like this:

index=* sourcetype=tfitzgerald15s_type |

And the indexer_lag is just a new field I am calculating based on what I want to base my threshold on for the example. So in your case it might be a calculation you have to do for CPU usage, or HTTP response times, or transaction duration.

Also, if this answers your question don't forget to accept/up-vote the answer 🙂

0 Karma

tfitzgerald15
Explorer

Awesome, thanks! I do just have one question. I'm not pointing to a specific indexer, I'm looking at a specific sourcetype. Would I still need the indexer_lag, and what does that represent? Apologies for the admitted newbie question there.

0 Karma
Get Updates on the Splunk Community!

Observability | How to Think About Instrumentation Overhead (White Paper)

Novice observability practitioners are often overly obsessed with performance. They might approach ...

Cloud Platform | Get Resiliency in the Cloud Event (Register Now!)

IDC Report: Enterprises Gain Higher Efficiency and Resiliency With Migration to Cloud  Today many enterprises ...

The Great Resilience Quest: 10th Leaderboard Update

The tenth leaderboard update (11.23-12.05) for The Great Resilience Quest is out &gt;&gt; As our brave ...