Splunk Search

How to create a search for delay of data in Splunk?

ursfischer
Engager

Hello all

As a splunk in an early station 😀 I currently have the following challenge:
We have many indexes and we want to do an analysis over all indexes how fast the log data is available in Splunk. As distance should be measured from writing the log (_time) to indexing time (_indextime). Also we want to exclude scatter (e.g. we currently have hosts with wrong time configuration, i.e. something like a Gaussian normal distribution).
Here is an example query, which is probably wrong or could be done much better by you:

| tstats latest(_time) AS logTime latest(_indextime) AS IndexTime WHERE index=bv* BY _time span=1h
| eval delta=IndexTime - logTime
| search (delta<1800 AND delta>0)
| table _time delta

Is the query approx correct so that we can answer the question what kind of deley we have over all? How could one use a Gaussian normal distribution instead of restricting the search manually?

Labels (2)
Tags (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

On top of @ITWhisperer 's suggestion I'd rather not use tstats to produce just one value per hour bin but rather calculate average over that delta for hour or shorter periods. If you have lots of data you could use sampling to do it only on a small subset of events.

0 Karma

ursfischer
Engager

well, we do have a lot of data (currently approx 10 billion events per day and more, increasing). tstats is probably not the best idea to use here, but faster than just a normal search. I will try with sampling and have a look how i can use this.
An other idea is to do some saved searches for each index, store the results (_time, _indextime, index) into a summary index and then use this make some statistics. but with more than 100 indexes it will take some time, effort and Splunk resources. also i am not shure if this will make things easyier for me.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You could consider using the Machine Learning ToolKit (MLTK) which is a free add-on from SplunkBase.

You can set up models of your data e.g. Gaussian / Normal distributions, and then look for anomalies.

0 Karma
Get Updates on the Splunk Community!

Aligning Observability Costs with Business Value: Practical Strategies

 Join us for an engaging Tech Talk on Aligning Observability Costs with Business Value: Practical ...

Mastering Data Pipelines: Unlocking Value with Splunk

 In today's AI-driven world, organizations must balance the challenges of managing the explosion of data with ...

Splunk Up Your Game: Why It's Time to Embrace Python 3.9+ and OpenSSL 3.0

Did you know that for Splunk Enterprise 9.4, Python 3.9 is the default interpreter? This shift is not just a ...