Hello
We deployed a new Splunk cluster containing a Cluster Manager, 3x SHC members, 6x Indexers. The cluster has hundreds of vCPUs in the SHC and Indexers, but after installing Enterprise Security 7.x we are seeing hundreds of skipped searches, specifically:
The maximum number of concurrent historical scheduled searches on an instance or cluster reached
The maximum number of concurrent auto-summarization searches reached
Logs indicate the searches seem to be getting skipped on the CM (which only has 12 CPU cores). We followed the documentation to install ES on a distributed cluster:
Install Splunk Enterprise Security in a search head cluster environment | Splunk Docs
(We used the CM which is our deployer to push ES to the SHC via shcluster apps folder)
Note: some summarization searches are running on the SHC members but the majority seem to be running on the CM.
Would appreciate any ideas as this has me stumped!
Did you remove ES Apps(/etc/apps/) from the CM(deployer in your case) after your deployment to shcluster apps folder, looks like your CM is running all the ES Apps searches(from /etc/apps/) which is causing skipped searches on CM.
Remove ES from the CM’s Splunk instance:
Stop Splunk on the CM.
Remove the ES app directory from $SPLUNK_HOME/etc/apps/ on the CM.
Start Splunk on the CM.
Verify ES is only on SHC members:
Ensure the ES app and its configurations are only present on the SHC members, deployed via the deployer ($SPLUNK_HOME/etc/shcluster/apps/).
Regards,
Prewin
Splunk Enthusiast | Always happy to help! If this answer helped you, please consider marking it as the solution or giving a Karma. Thanks!
Did you remove ES Apps(/etc/apps/) from the CM(deployer in your case) after your deployment to shcluster apps folder, looks like your CM is running all the ES Apps searches(from /etc/apps/) which is causing skipped searches on CM.
Remove ES from the CM’s Splunk instance:
Stop Splunk on the CM.
Remove the ES app directory from $SPLUNK_HOME/etc/apps/ on the CM.
Start Splunk on the CM.
Verify ES is only on SHC members:
Ensure the ES app and its configurations are only present on the SHC members, deployed via the deployer ($SPLUNK_HOME/etc/shcluster/apps/).
Regards,
Prewin
Splunk Enthusiast | Always happy to help! If this answer helped you, please consider marking it as the solution or giving a Karma. Thanks!
Thank you for the clear answer. Removed and working fine. Does Splunk ES documentation state this anywhere?
It's not even that you _should_ remove ES from the deployer in the default installation but rather you must have done something differently for which the removal of ES was the cure. Normally ES should detect that it's being deployed on a deployer and should _not_ set itself as a "runnable" instance.
1. @PrewinThomas is on the right track here - your CM/deployer should be used only to deploy the ES app. It shouldn't run it after that.
2. Still, you might soon run into the same problem with searches skipped/delayed if you leave the default schedules and the searches will bunch up at certain points in time. As a rule of thumb you should review your scheduled searches (reports, correlation searches, datamodel acceleration) and spread them across available time slots.
Karma to both answers above. Don't let ES run on the Cluster Master and It sounds like you have beefy servers, but ES can bring the beefiest of servers to its knees if you are not careful. The Splunk ES Content Pack has close to 6000 (I could be underestimating the number it could be higher) correlation / finding searches. If someone goes in and turns on all of those searches and you have your own stuff running, Splunk will absolutely choke and die.
As a best practice I track each and every search that is running on my ES instances and map the time windows that those searches run. Any new searches activated are set to run in the time windows that have the least amount of searches running. Just remember that the general rule of thumb is that each search that runs, occupies one cpu core and one gig of ram while running.
Additionally, if you have the resources, you can open the pandora's box of multithreading and / or allowing more concurrent searches - but do this ONLY as a last resort and you should validate by running top, or checking the Management console or whatever tool you use to validate that you have spare cpu and ram to allow more concurrent searches.