Splunk Search

eval with mathematical calculations?

tfitzgerald15
Explorer

I'm trying to do something a little wonky here, so please bear with me. The code below is the logical flow of what I'm trying to accomplish. However I know for a fact it won't work. Can you give me some insight?

... | eval threshold=(outlier action=rm param=10 (stdev(count)))

Basically I'm trying to create a field, "threshold", which calculates the standard deviation of my results, removing any outliers prior to calculating the standard deviation. (Basically, I don't want that one result of 7,000 skewing my standard deviation upwards when normally it would have been, say, 10).

Tags (4)
0 Karma

jhupka
Path Finder

Is this what you're looking for...here's a search that does what I think you are asking for on indexer lag (_indextime-_time). So if this does what you're looking for you'll just need to modify to fit your search/data:

index=_internal | eval indexer_lag =_indextime - _time 
| eventstats p25(indexer_lag) as q1, p75(indexer_lag) as q3 | eval iqr=q3-q1 | eval threshold=10*iqr 
| where indexer_lag < threshold 
| eventstats stdev(indexer_lag) as threshold_stddev

I'll explain each part:

index=_internal | eval indexer_lag =_indextime - _time 

^^^ Calc our index lag for each event

 | eventstats p25(indexer_lag) as q1, p75(indexer_lag) as q3 | eval iqr=q3-q1 | eval threshold=10*iqr 

^^^ Now use eventstats to get our q1, q3 in-line with our events, then calc our interquartile range, and our threshold based on your choosing of 10*iqr from your original post.

| where indexer_lag < threshold 

^^^ Only keep events that have lag less than our threshold. e.g. remove our 10*iqr outliers

| eventstats stdev(indexer_lag) as threshold_stddev

^^^ Finally use eventstats again to calculate the new standard deviation in-line based on our new list of events.

jhupka
Path Finder

So you don't need the indexer_lag field, per se, but your overall search will be similar. If you're looking at specific sourcetype over all indexes, then your search may start like this:

index=* sourcetype=tfitzgerald15s_type |

And the indexer_lag is just a new field I am calculating based on what I want to base my threshold on for the example. So in your case it might be a calculation you have to do for CPU usage, or HTTP response times, or transaction duration.

Also, if this answers your question don't forget to accept/up-vote the answer 🙂

0 Karma

tfitzgerald15
Explorer

Awesome, thanks! I do just have one question. I'm not pointing to a specific indexer, I'm looking at a specific sourcetype. Would I still need the indexer_lag, and what does that represent? Apologies for the admitted newbie question there.

0 Karma
Get Updates on the Splunk Community!

Splunk Observability Cloud | Unified Identity - Now Available for Existing Splunk ...

Raise your hand if you’ve already forgotten your username or password when logging into an account. (We can’t ...

Index This | How many sides does a circle have?

February 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

Registration for Splunk University is Now Open!

Are you ready for an adventure in learning?   Brace yourselves because Splunk University is back, and it's ...