Getting Data In

Restart of indexer node causes data duplication in summary index

keerthana_k
Communicator

Hi

We have a single indexer setup wherein data is indexed continuously. We have around 1000 events in our raw index. We have a summarization search that runs every 5 minutes and collects data into summary index. Our indexer node was powered off and then powered on again. After this, we noticed that the number of events in the summary index has increased randomly even though the events in the raw index remains the same. The following is the configuration for the 5 minutes summarization in our savedsearches.conf:

[Do Not Click - Summary Index - Summary 5m]
action.summary_index = 1
action.summary_index._name = summary_index
action.summary_index.marker = marker_name
cron_schedule = 1-59/5 * * * *
dispatch.earliest_time = -11m@m
dispatch.latest_time = -6m@m
displayview = report_builder_display
enableSched = 1
realtime_schedule = 0
request.ui_dispatch_view = report_builder_display
search = <our search query>

When we repeatedly power off and on the indexer, the number of events in the summary index gets increased randomly. Require inputs on the cause of this issue and the possible solution.

Thanks

Tags (2)
0 Karma
1 Solution

Lowell
Super Champion

A few things to note:

  1. Summary indexes are known to have this kind of issue.
  2. The syntax cron_schedule = 1-59/5 * * * * looks odd to me, possibly this is a valid cron syntax, but a much simpler version would be cron_schedule = */5 * * * *, unless for some reason you are trying to avoid it running at the top of the hour (minute=0)
  3. Double check that run_on_startup is set to false by default. (Or just add run_on_startup = false to this entry as an extra safe guard.

Beyond that, check look at the info_* fields in the summary index for these event to see if you can determine the source of the scheduling discrepancy. (Look at the docs for the addinfo command for the meaning of each info_* field.)

If you are running 5.x or later, you may also want to consider if report acceleration would be a better option for you.

View solution in original post

0 Karma

Lowell
Super Champion

A few things to note:

  1. Summary indexes are known to have this kind of issue.
  2. The syntax cron_schedule = 1-59/5 * * * * looks odd to me, possibly this is a valid cron syntax, but a much simpler version would be cron_schedule = */5 * * * *, unless for some reason you are trying to avoid it running at the top of the hour (minute=0)
  3. Double check that run_on_startup is set to false by default. (Or just add run_on_startup = false to this entry as an extra safe guard.

Beyond that, check look at the info_* fields in the summary index for these event to see if you can determine the source of the scheduling discrepancy. (Look at the docs for the addinfo command for the meaning of each info_* field.)

If you are running 5.x or later, you may also want to consider if report acceleration would be a better option for you.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...