topic Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster? in Deployment Architecture

How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?

ddrillic — Fri, 20 Oct 2017 17:43:23 GMT

We reach situations where summary indexes are incomplete because we have an indexing latency in the cluster.

We usually set the same number of minutes for the Earliest and the Run every parameters...

What can be done? I think the issue is that the latency varies throughout the day and the week.

Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?

harsmarvania57 — Mon, 23 Oct 2017 09:31:54 GMT

Hi @ddrillic,

If you are not using latest data from summary index then I'll suggest to change earliest time to -60m@m and latest time to -30m@m

Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?

niketn — Mon, 23 Oct 2017 09:58:21 GMT

@ddrillic, ideally you should pick the previous time window as per your data flow to ensure you are summarizing the events only after you have received all the events for that time window. For example, for the current hour pull data for last hour, or for the current day pull data for yesterday etc.

Run every 30 minutes for last 30 minutes not only may give you gaps but duplicates by accounting same events for two consecutive windows with overlap. So please ensure you understand the data flow/frequency and the need of summary indexing before kicking of summaries.

If you want your dahboards to show Details from Real Time index or Summary index you should create a switch for Summary Index based on time selected. You can find an answer for this kind of switch: https://answers.splunk.com/answers/578984/running-one-of-two-searches-based-on-time-picker-s.html

Refer to Splunk Developer Video with detailed explanation of the same: https://www.splunk.com/view/SP-CAAACZW
and also the Splunk Documentation: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Usesummaryindexing#Schedule_the_populating_report_to_avoid_data_gaps_and_overlaps

Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?

ddrillic — Mon, 23 Oct 2017 14:25:26 GMT

Very interesting - now it's clear to me that setting Latest to now - @m is not practical - much appreciated.

Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?

ddrillic — Mon, 23 Oct 2017 14:27:53 GMT

I see ; - ) isn't it as simple as giving a buffer of time to allow all events to safely be in the platform? meaning, give a delay of 15-30 minutes....

-- Run every 30 minutes for last 30 minutes not only may give you gaps but duplicates by accounting same events for two consecutive windows with overlap

Why?

Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?

niketn — Mon, 23 Oct 2017 16:10:45 GMT

@ddrillic, sorry I overthought on that one. There would just be gaps no duplicates. If your schedule is on cron to run every 30 min, it might run a bit late (based on priority and load on server). Let us say 12:01 instead of 12:00 and then 12:33 instead of 12:30 and so on through out the day. Duplicates should not come in since scheduled run will not be early in any case. You can test the schedule run based on some mock queries which do not summarize events (if you are using collect command you can enable testmode=true to ensure it executes scheduled search and generates stats but does not push data to sumary index. Testing with collect command will also let you push data to your own summary index which you can get rid of after testing.

On a safer side, lets say if your cron is set to run every 5th minute of an hour and earliest and latest are set to pull data for -1h@h to -0h@h, you will not have gaps and in case your data input is impacted, you will have an hour to resolve the issue (I think similar example is there in the video link provided above with window being -2h@h and -1h@h to be even more safe 😉 ). However, if your requirement is to summarize data every 30 mins you can do the same if you are allowing buffer based on delay of data ingestion for that window.

Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?

ddrillic — Mon, 23 Oct 2017 16:36:36 GMT

Gorgeous as usual @niketnilay - please convert to an answer.

Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?

niketn — Mon, 23 Oct 2017 17:14:49 GMT

@ddrillic, thanks for the kind words. For an answer on topics like these I usually wait for Gurus to chime in, correct or approve before I convert to answer 🙂 Hoping that my inputs provided you with what you were looking for. I have converted to answer!

Re: How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?

ddrillic — Mon, 23 Oct 2017 17:31:33 GMT

Much appreciated @niketnilay !!!