Hello there,
In our environment we have datamodel accelerations that are consistently reaching the Max Summarization Search Time, which is the default 3600 seconds. We know the issue is related to the resources allocated to the indexing tier as the accelerations are maxing out CPU. It will be remediated, but not immediately.
What I am interested in finding out is how the limit is implemented, if an acceleration never completes, just times out and starts the next summary, is there the potential for some data to not be accelerated?
We also currently have searches using summariesonly=t with a time range of -30m, our max concurrent auto summarizations is 2, so I know there can be up to a 55 minute gap in tstats data, meaning the searches could miss events. While not best practice, could setting the max summarization search time to 1800 seconds be a potential solution?
Thanks for your help!
@Trevorator
After the initial creation of data model acceleration summaries, Splunk regularly runs scheduled summarization searches to incorporate new data and remove information that is older than the defined summary range.
If a summarization search exceeds the Max Summarization Search Time limit, it is stopped before completing its assigned interval.
Normally, Splunk does not automatically retry or continue the interrupted summarization for that specific time window, which can result in gaps in your accelerated data if summarization searches repeatedly fail or time out.
These gaps mean that some events will not be included in the .tsidx summary files, causing searches that rely on tstats summariesonly=true to miss those events
I would say best approach is to address the resource constraints causing your summarization searches to run too long.
Regards,
Prewin
Splunk Enthusiast | Always happy to help! If this answer helped you, please consider marking it as the solution or giving a kudos/Karma. Thanks!
Hi @Trevorator
What are your acceleration.backfill_time and acceleration.earliest_time set to?
Reducing acceleration.max_time from 3600 seconds to 1800 seconds is unlikely to be a solution and may worsen the problem. If a summarization search requires, for example, 2000 seconds to process its assigned time range due to resource constraints, it would complete with a 3600-second timeout but would fail with an 1800-second timeout. This would lead to more frequent timeouts and potentially larger gaps in your accelerated data.
I think the best option is to determine why the search is taking so long to run, is the DM restricted to your only your required set of indexes?
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
Hi @Trevorator ,
you have two solutions:
delay the time frame: e.g. if you have a delay in acceleration of 5 minutes, you can use in your Correlation Searches as time borders: e.g. from -10m@m to -5m@m instead of from -5m@m to now.
otherwise you can use the option summariesonly=false in your tstats command, so the command also reads the not yet accelerated data, but this solution is obviously less performant than the other.
Ciao.
Giuseppe
Hi @gcusello that does make sense for the correlation searches, but I am still interested about the impact to the datamodel acceleration itself. Will there be issues in the tsidx files if the acceleration never fully completes? Or will the next summary pick up where it left off ones it hits the summarization limit.
If it's the latter, does that mean the most recent data is consistently getting delayed in it's acceleration because each acceleration search needs to catch up on the previous debt?
Hi @Trevorator ,
As @Prewin27 pointed out, if your acceleration queries exceed the maximum time limit, you should analyze why this happens, in other words, what is your storage performance and whether system resources are sufficient.
For storage performances, check if the IOPS value of each storage is greater than 800 using an eternal tool like e.g. Bonnie++ and how many CPUs you have in your indexers and Search Heads, you can check this using the Monitoring Console.
Ciao.
Giuseppe
@Trevorator
After the initial creation of data model acceleration summaries, Splunk regularly runs scheduled summarization searches to incorporate new data and remove information that is older than the defined summary range.
If a summarization search exceeds the Max Summarization Search Time limit, it is stopped before completing its assigned interval.
Normally, Splunk does not automatically retry or continue the interrupted summarization for that specific time window, which can result in gaps in your accelerated data if summarization searches repeatedly fail or time out.
These gaps mean that some events will not be included in the .tsidx summary files, causing searches that rely on tstats summariesonly=true to miss those events
I would say best approach is to address the resource constraints causing your summarization searches to run too long.
Regards,
Prewin
Splunk Enthusiast | Always happy to help! If this answer helped you, please consider marking it as the solution or giving a kudos/Karma. Thanks!
@Prewin27
This is what I was worried was the case. You said that "Normally, Splunk does not automatically retry or continue". Does that mean there is a setting that we could enable to have Splunk do this to ensure there is no loss in .tsidx files in the short term? The goal is to have all data accelerated for enterprise security searches. I know the long term solution is new machines with better iops but it may be some time before they are requisitioned.
@Trevorator
I dont think there is any Splunk setting to enable automatic retry or continuation for .tsidx file operations. The only way to ensure all data is accelerated and .tsidx files are preserved is to maintain a healthy infrastructure and address any resource limitations.
Regards,
Prewin
Splunk Enthusiast | Always happy to help! If this answer helped you, please consider marking it as the solution or giving a kudos/Karma. Thanks!