I have one search head and 3 indexers consuming about 50gb of data a day. All servers are running Splunk 6.3.1. The indexers are typically less than 20% utilized.
We purchased Splunk Enterprise Security and loaded it onto a separate search head. When we connect it up to the 3 indexers, they all go to 100% CPU and after about 24 hours, fall over. Servers are all Unix and have 4 cores with 16gb. The ES server is 8 cores and 32gb. I realize that it is recommended that it has 16 cores, but it is not this server that is causing the problem.
Is there anything that I can turn off so that we can identify what is causing the indexers to fail.
Any help would be most appreciated as this was not a cheap product to buy and currently it is not switched on.
You should start to look at datamodel acceleration: which one do you need, which one can have acceleration disabled... have a look at the datamodel acceleration dashboard.
And keep in mind that when ES starts for the first time with existing data, he needs to start to build all history of datamodel. So depending on your volume of data, the resources may be used intensively for a few hours to a few days.
One to consider is that the core requirement on your indexers is below spec. When a search is dispatched from the SH, its also used on the Indexers to run, retrieve, and stream results back to your SH.
ES also heavily relies on data models. There are background processes running every 5 minutes on the indexers that backfill all the datamodels based on the TA's that are enabled. So there are a lot of moving parts here to take into account.
How many data sources / TA do you have running?
How many correlation searches did you enable?
How many of these are real-time vs scheduled?
Assets and identities?
Threat feeds?
Splunk typically recommends a two week PS engagement for ES implementations, a lot of this includes tuning your environment for the data sources onboarded and expected to be onboarded.
Just from what you are saying though, I would imagine your indexing tier is the bottleneck. I would start by disabling all of your correlation searches, and adjusting the acceleration on your data models down to 1day. Additionally, you can tune the data models to search only specific indexes, so you can technically limit the CIM application of your data sources. This helps also for processing.
If you don't want to disable all the correlation searches, you could just disable the real time searches.
OR, on your indexers, configure the various "indexed_realtime" settings in limits.conf. This will dramatically reduce the impact of real time searches.
Configuring Enterprise Security is a cost-time process that can easily take several weeks.
It is recommended to get some help from Splunk professional services or local partners in order to configure everything right.
With regards to your query, Enterprise Security has a lot of correlation searches and data acceleration models enabled by default. Some of those might be taking longer than expected because your instance is not fully configured (no assets or identities defined, not enough data sources, CIM normalisation, etc).
Something you can do is to turn everything off and then slowly enable every correlation search or data acceleration model you are going to require based on your needs. Take a look at the Installation and User manual here. Go through it in depth and then define your user cases very clearly.
Hope that helps.