Deployment Architecture

Summary index cron schedule to populate first then schedule

sajeeshpn
New Member

Hi,

I am creating a new summary index and scheduled it to run every 6 hours intervals. In savedsearches.conf, put like:-

cron_schedule = 0 */6 * * *

With this change, only after 6 hours I can expect for some data to get populated in the summary index right.

But I would like to know whether the summary search gets executed once (now) and then gets scheduled to run every 6 hours. So that there would be some data in summary index immediately.

Thanks,
Sajeesh

Tags (1)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi sajeeshpn,
if you want to do this your have to be aware to the period of your searches because summarization isn't a normal Splunk Ingestion so there isn't the check on already ingested events and there is the risk to have duplicated events or to lose events.
So if your next run will be at 12.00 with period from the 6.00 to 12.00, to have the data you have to choose a period before 6.00, and it's better to take events not to now but until a safe period before now (e.g. from -370m@m to -10m@m).
In addition I suggest to you to verify the continuity of your logs because, if there is some large delay (e.g. 1 hour), you risk to lose your data and you have to consider this choosing the safe period.
To verify the continuity of your logs you have to verify what is the difference between _time and _indextime.
To be more sure you could take a larger time period (e.g. 12 hours) and insert in your search a check on the _indextime, excluding all logs with _indextime before 6 hours.
Bye.
Giuseppe

View solution in original post

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi sajeeshpn,
if you want to do this your have to be aware to the period of your searches because summarization isn't a normal Splunk Ingestion so there isn't the check on already ingested events and there is the risk to have duplicated events or to lose events.
So if your next run will be at 12.00 with period from the 6.00 to 12.00, to have the data you have to choose a period before 6.00, and it's better to take events not to now but until a safe period before now (e.g. from -370m@m to -10m@m).
In addition I suggest to you to verify the continuity of your logs because, if there is some large delay (e.g. 1 hour), you risk to lose your data and you have to consider this choosing the safe period.
To verify the continuity of your logs you have to verify what is the difference between _time and _indextime.
To be more sure you could take a larger time period (e.g. 12 hours) and insert in your search a check on the _indextime, excluding all logs with _indextime before 6 hours.
Bye.
Giuseppe

0 Karma

sajeeshpn
New Member

Thank you !

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...