what i found in my case is,
when the search head went down , i found out there are some "REAL-TIME" searches were running by other users.
and for confirmation i have checked DMC on my search head ,and i got the same thing at what time "SEARCH" process taken more RAM and CPU ,
then i came to one conclusion that , because of some weird searches, my SH went down
hope this helps
we have the same problem caused by an high use of CPUs: on indexers we have 12CPUs but sometimes we have at the same time more than 20 scheduled searches so there's a queue and after some time there's a disconnection for timeout (peer has status = "Down". ).
You can check this using Monitoring Console (Resource usage: instance, 90th Percentile CPU Usage by Process Class).
Splunk Support suggested to optimize searches, give more CPUs to the system and don't use higher timeout values.
We're working to do this, I'll inform you!
Hi Hi vrmandadi,
we solved the problem optimizing searches: there was a very heavy search scheduled every ten minutes that overloaded the system!
Anyway, we used higher timeout value.
At first, using Splunk Monitoring Console, see if there are peaks of CPU.
Then see if there are scheduled searches and/or accelerated searches [Settings -- Searches, Reports and Alerts] and if someone of them are scheduled at the same time of the peaks.
Then see if you can optimize these searches: see if there are joins or transactions, or accelerations, in other words: there isn't a configuration file to modify, you have to find the critical searches and then optimize them.
I can report you my experience:
in my system I found that there was a peak every then minutes,
watching scheduled search I found that there was a very heavy accelated search that started every then minutes!
Than I planned in a different way this search (I transformed my search in a scheduled report running once a day in the night) and my system restarted to work well!
I hope to be useful for you.
The search head you are on is not a able to connect with peer (https://foobar237.xxx.com:8089).
Make sure you set distributed search properly: http://docs.splunk.com/Documentation/Splunk/latest/DistSearch/Configuredistributedsearch
If you getting too much of these then you can edit
distsearch.conf. Also check the
foobar237.xxx.com to see what is going wrong there.
Check out these settings in
connectionTimeout = * Amount of time in seconds to use as a timeout during search peer connection establishment. sendTimeout = * Amount of time in seconds to use as a timeout while trying to write/send data to a search peer. receiveTimeout = * Amount of time in seconds to use as a timeout while trying to read/receive data from a search peer.