Deployment Architecture

How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?

ddrillic
Ultra Champion

We reach situations where summary indexes are incomplete because we have an indexing latency in the cluster.

We usually set the same number of minutes for the Earliest and the Run every parameters...

alt text

What can be done? I think the issue is that the latency varies throughout the day and the week.

1 Solution

niketn
Legend

@ddrillic, ideally you should pick the previous time window as per your data flow to ensure you are summarizing the events only after you have received all the events for that time window. For example, for the current hour pull data for last hour, or for the current day pull data for yesterday etc.

Run every 30 minutes for last 30 minutes not only may give you gaps but duplicates by accounting same events for two consecutive windows with overlap. So please ensure you understand the data flow/frequency and the need of summary indexing before kicking of summaries.

If you want your dahboards to show Details from Real Time index or Summary index you should create a switch for Summary Index based on time selected. You can find an answer for this kind of switch: https://answers.splunk.com/answers/578984/running-one-of-two-searches-based-on-time-picker-s.html

Refer to Splunk Developer Video with detailed explanation of the same: https://www.splunk.com/view/SP-CAAACZW
and also the Splunk Documentation: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Usesummaryindexing#Schedule_the_populat...

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

0 Karma

niketn
Legend

@ddrillic, ideally you should pick the previous time window as per your data flow to ensure you are summarizing the events only after you have received all the events for that time window. For example, for the current hour pull data for last hour, or for the current day pull data for yesterday etc.

Run every 30 minutes for last 30 minutes not only may give you gaps but duplicates by accounting same events for two consecutive windows with overlap. So please ensure you understand the data flow/frequency and the need of summary indexing before kicking of summaries.

If you want your dahboards to show Details from Real Time index or Summary index you should create a switch for Summary Index based on time selected. You can find an answer for this kind of switch: https://answers.splunk.com/answers/578984/running-one-of-two-searches-based-on-time-picker-s.html

Refer to Splunk Developer Video with detailed explanation of the same: https://www.splunk.com/view/SP-CAAACZW
and also the Splunk Documentation: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Usesummaryindexing#Schedule_the_populat...

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

ddrillic
Ultra Champion

I see ; - ) isn't it as simple as giving a buffer of time to allow all events to safely be in the platform? meaning, give a delay of 15-30 minutes....

-- Run every 30 minutes for last 30 minutes not only may give you gaps but duplicates by accounting same events for two consecutive windows with overlap

Why?

0 Karma

niketn
Legend

@ddrillic, sorry I overthought on that one. There would just be gaps no duplicates. If your schedule is on cron to run every 30 min, it might run a bit late (based on priority and load on server). Let us say 12:01 instead of 12:00 and then 12:33 instead of 12:30 and so on through out the day. Duplicates should not come in since scheduled run will not be early in any case. You can test the schedule run based on some mock queries which do not summarize events (if you are using collect command you can enable testmode=true to ensure it executes scheduled search and generates stats but does not push data to sumary index. Testing with collect command will also let you push data to your own summary index which you can get rid of after testing.

On a safer side, lets say if your cron is set to run every 5th minute of an hour and earliest and latest are set to pull data for -1h@h to -0h@h, you will not have gaps and in case your data input is impacted, you will have an hour to resolve the issue (I think similar example is there in the video link provided above with window being -2h@h and -1h@h to be even more safe 😉 ). However, if your requirement is to summarize data every 30 mins you can do the same if you are allowing buffer based on delay of data ingestion for that window.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

ddrillic
Ultra Champion

Gorgeous as usual @niketnilay - please convert to an answer.

0 Karma

niketn
Legend

@ddrillic, thanks for the kind words. For an answer on topics like these I usually wait for Gurus to chime in, correct or approve before I convert to answer 🙂 Hoping that my inputs provided you with what you were looking for. I have converted to answer!

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

ddrillic
Ultra Champion

Much appreciated @niketnilay !!!

0 Karma

harsmarvania57
Ultra Champion

Hi @ddrillic,

If you are not using latest data from summary index then I'll suggest to change earliest time to -60m@m and latest time to -30m@m

ddrillic
Ultra Champion

Very interesting - now it's clear to me that setting Latest to now - @m is not practical - much appreciated.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...