Splunk Search

eval with mathematical calculations?

tfitzgerald15
Explorer

I'm trying to do something a little wonky here, so please bear with me. The code below is the logical flow of what I'm trying to accomplish. However I know for a fact it won't work. Can you give me some insight?

... | eval threshold=(outlier action=rm param=10 (stdev(count)))

Basically I'm trying to create a field, "threshold", which calculates the standard deviation of my results, removing any outliers prior to calculating the standard deviation. (Basically, I don't want that one result of 7,000 skewing my standard deviation upwards when normally it would have been, say, 10).

Tags (4)
0 Karma

jhupka
Path Finder

Is this what you're looking for...here's a search that does what I think you are asking for on indexer lag (_indextime-_time). So if this does what you're looking for you'll just need to modify to fit your search/data:

index=_internal | eval indexer_lag =_indextime - _time 
| eventstats p25(indexer_lag) as q1, p75(indexer_lag) as q3 | eval iqr=q3-q1 | eval threshold=10*iqr 
| where indexer_lag < threshold 
| eventstats stdev(indexer_lag) as threshold_stddev

I'll explain each part:

index=_internal | eval indexer_lag =_indextime - _time 

^^^ Calc our index lag for each event

 | eventstats p25(indexer_lag) as q1, p75(indexer_lag) as q3 | eval iqr=q3-q1 | eval threshold=10*iqr 

^^^ Now use eventstats to get our q1, q3 in-line with our events, then calc our interquartile range, and our threshold based on your choosing of 10*iqr from your original post.

| where indexer_lag < threshold 

^^^ Only keep events that have lag less than our threshold. e.g. remove our 10*iqr outliers

| eventstats stdev(indexer_lag) as threshold_stddev

^^^ Finally use eventstats again to calculate the new standard deviation in-line based on our new list of events.

jhupka
Path Finder

So you don't need the indexer_lag field, per se, but your overall search will be similar. If you're looking at specific sourcetype over all indexes, then your search may start like this:

index=* sourcetype=tfitzgerald15s_type |

And the indexer_lag is just a new field I am calculating based on what I want to base my threshold on for the example. So in your case it might be a calculation you have to do for CPU usage, or HTTP response times, or transaction duration.

Also, if this answers your question don't forget to accept/up-vote the answer 🙂

0 Karma

tfitzgerald15
Explorer

Awesome, thanks! I do just have one question. I'm not pointing to a specific indexer, I'm looking at a specific sourcetype. Would I still need the indexer_lag, and what does that represent? Apologies for the admitted newbie question there.

0 Karma
Get Updates on the Splunk Community!

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Shape the Future of Splunk: Join the Product Research Lab!

Join the Splunk Product Research Lab and connect with us in the Slack channel #product-research-lab to get ...