Our search head becomes unresponsive after a few hours of operation. We then have to physically restart the server. restarting the Splunk service on the machine does not alleviate the issue. Had anyone had a similar issue? How were you able to resolve it?
Check the load on the server and the number of threads in splunk related processes, as well.
Check your logs (Splunk/var/log/splunk), are any of them rotating ~1/minute? We have seen this cause this problem.
It also happens when the drive gets within 2GB of remaining space.
Do you have network access to the server when it locks up? For us, we cannot ping, kvm, or rdc to it (all of the ports have been used up).
Is this a Windows server or a Unix server?
Have you used Splunk to look at the system and performance logs on the server?
Thank you for the explanation. I checked our splunk logs and they do not seem to be rotating fast - once per day - so I do not believe this to be the cause either.
What we have seen, for example, are cowpipeline errors in our splunkd.log at a rate of 25MB/minute. This causes the log to rotate (25MB is the log size limit). Over time it seems that this constant bombarding of the indexer causes Splunk to hang. The splunkd.log is indexed, but it does not count against your license volume limit.
Can explain what you mean by the logs "rotating ~1/minute"? Lack of space is not causing our issue.