We need advice on setting up search head(s). We have set up a distributed search system with 12 indexers and 2 search heads. We are using one search head as the primary search head for doing periodic (mostly on one minute poll intervals) for alerting purposes. Very few ad hoc searches are done on that search head. This search head seems to work well.
The other search head is doing ad hoc queries (we allow users to run real time searches) and driving some displays (which are mainly driven by real time searches). This search head is problematic. We get slow downs on this search head quite often. We usually have to restart the search head to "fix" it.
We use the "Jobs" screen to see what is going on. We will have around 10 real time searches running and a few fairly short ad-hoc searches. CPU utilization on the host does not seem to be an issue. It is usually running at 10% total utilization or less. The indexers never seem to have high utilization (and the other search head seems quick and responsive).
It seems we are running into some bottleneck on the search head and we aren't sure the best way to troubleshoot. It also seems that once we run into this bottleneck, there is no recovery other than restarting. Is there a way to shed search load without restarting the entire search head. Any help in troubleshooting performance issues with the search head would be most appreciated.
It seems odd that your are not seeing work on the indexers as that is where the search is actually performed.
We have 3 somewhat distinct IT units so we're running 1 search head and three indexers (Splunk 4.1.6).
Each unit forwards their data to their own indexer.
In the past, the search head would get completely unusable at times.
In almost every case, it seemed to be related to people attempting to create field extractions through the web interface on our search head.
IIRC, problem seem to get triggered when testing certain field extraction expression.
We were forced to restart when that happened.
I can't say which version we were using when this would hit us but, I can't remember it happening since we upgraded to 4.1.6.
Only other thing we did was to force restart the search head nightly via crontab.
I recently created several field extractions but, did not hit the condition. However the log file was very simple (comma delim. key=val pairs).
Our indexers get one core saturated by a given search. The search head doesn't do anything- it's just loafing along. Are you seeing the same behavior on the indexers- i.e., one core per search being saturated, the remaining cores doing no work?