I have an issue with the storage i use (NAS) that is dropping its connection (or more so its ability to write) which is resolved with a restart of the splunk service. Is there a way to trigger a restart of the service when the indexer queue gets high?
Splunk 6 on windows 2012.
Thanks
I used the SOS app to find the triggered events that were published in the log and created this search
index=_internal source=*splunkd.log component=timeinvertedIndex
after finding that search i did the drop down in the search results page to convert this to an alert and configured it to email and restart the services via a script.
I used the SOS app to find the triggered events that were published in the log and created this search
index=_internal source=*splunkd.log component=timeinvertedIndex
after finding that search i did the drop down in the search results page to convert this to an alert and configured it to email and restart the services via a script.
1) You can write a search to look for certain conditions, thereby triggering an alert.
2) The S.o.S. app can help you isolate the necessary search against the "metrics.log" to show you when that pipeline is full.
3) You can write a script to take some action on your behalf when an alert is triggered.
4) Restarting the indexer may seem to alleviate the issue, but because it causes the forwarders to queue up the data they were going to send, you are likely only delaying the inevitable.
5) Are you sure you really want a scripted action to restart services on your behalf without any user interaction?
6) A typo in a dashboard (if you're editing the XML directly) can result in a prompt from splunk when starting up--read as "interactive startup".
7) Perhaps a support case would be a better route to help identify and alleviate the bottleneck issue, rather than simply restarting ever time it's an issue.
tl;dr: Yes, you can do that, but are you sure it's what you want to do?
I start to get these errors in the splunkd.log file, which, as best i can tell is a connectivity issue via TCP/IP.
"Error trying to begin socket accept: An invalid argument was supplied"
This is a windows 2012 box using Isilon as the storage.
getting Isilon to work well with windows 2012 was a bit of a struggle on its own, so I think this has to do more with 2012 and isilon than it does splunk. And its not that the indexer is overwhelmed, its that it cant communicate to the storage, once i restart splunk it processes the indexer queue in a matter of seconds.