I have 35 report acceleration summaries from Cisco Security Suite.
Every 10 minutes 800-1200 of those same auto summarize searches try to run, but they get skipped.
01-30-2018 11:32:45.536 -0500 INFO SavedSplunker - savedsearch_id="nobody;Splunk_CiscoSecuritySuite;_ACCELERATE_FC554714-9307-4F2E-B815-B364569D785E_Splunk_CiscoSecuritySuite_nobody_470b01eef1db82fd_ACCELERATE_", search_type="report_acceleration", user="nobody", app="Splunk_CiscoSecuritySuite", savedsearch_name="_ACCELERATE_FC554714-9307-4F2E-B815-B364569D785E_Splunk_CiscoSecuritySuite_nobody_470b01eef1db82fd_ACCELERATE_", priority=default, status=skipped, reason="The maximum number of concurrent auto-summarization searches on this cluster has been reached", concurrency_category="summarization_scheduled", concurrency_context="saved-search_cluster-wide", concurrency_limit=30, scheduled_time=1517329800, window_time=0
I have these settings:
base_max_searches=6
max_searches_per_cpu=1 #Windows environment...please don't judge...I wanted Linux
num_cpus=96 #6 node SHC with 16 cpu/node
auto_summary_perc=50
So, I think my SHC should be able to run 51 accelerations concurrently.
(base_max_searches + (max_searches_per_cpu * num_cpus)) * auto_summary_perc)
(6 + (1 * 96)) * .50 = 51
Unfortunately Splunk tries to run hundreds of them at once, so:
base_search_max_pipeline
from 1?@davpx has a point - you are trying to run way more searches than your Splunk environment can probably handle. But you don't have to throw out the app! Just turn off report acceleration. (And do fix the num_cpus
as @davpx suggested.)
Go to Settings >> Reports... and look for any searches that have a lightning bolt icon (because they are accelerated). Edit the search settings so the searches are no longer accelerated.
And you might want to look to see if you have accelerated searches in other apps as well...
Turning off acceleration doesn't break anything, it just stops all this background processing that speeds up the searches. In your environment, the background processing was a bad trade-off. Letting the searches run a little slower should be much better. (And you could leave acceleration on for any searches that you use very frequently.)
Finally, if you have data model acceleration, you can turn that off as well. But if you are using the Splunk Enterprise Security app (ES), you need to be very careful about turning off any data model acceleration.
The data model acceleration setting "Maximum Concurrent Summarization Searches" defaults to 3, which means they won't run well on a 16 core system. It should help to bump these down to 1 across the board, then increase it for searches that need more horsepower.
The Data Model Audit dashboard (/app/search/datamodel_audit) is handy for tuning the acceleration jobs.
@davpx has a point - you are trying to run way more searches than your Splunk environment can probably handle. But you don't have to throw out the app! Just turn off report acceleration. (And do fix the num_cpus
as @davpx suggested.)
Go to Settings >> Reports... and look for any searches that have a lightning bolt icon (because they are accelerated). Edit the search settings so the searches are no longer accelerated.
And you might want to look to see if you have accelerated searches in other apps as well...
Turning off acceleration doesn't break anything, it just stops all this background processing that speeds up the searches. In your environment, the background processing was a bad trade-off. Letting the searches run a little slower should be much better. (And you could leave acceleration on for any searches that you use very frequently.)
Finally, if you have data model acceleration, you can turn that off as well. But if you are using the Splunk Enterprise Security app (ES), you need to be very careful about turning off any data model acceleration.
I was considering that and I only have a few accelerations outside of Cisco Security Suite.
I like option 4. Your num_cpus is way too high for a 16 core system. That setting is per-machine, not per-cluster. You're simply running too many searches than your cluster can handle at once.
I do realize that, but I was just trying to simplify the overall illustration.
Machine-by-machine is like below
(6 + (1 * 16)) * .50 = 11
(6 + (1 * 16)) * .50 = 11
(6 + (1 * 16)) * .50 = 11
(6 + (1 * 16)) * .50 = 11
(6 + (1 * 16)) * .50 = 11
(6 + (1 * 16)) * .50 = 11
Cisco Security Suite is kind of CPU greedy